[RFC PATCH] nvme: prevent hang on surprise removal of NVMe disk
Hannes Reinecke
hare at suse.de
Wed Feb 16 03:32:37 PST 2022
On 2/16/22 12:18, Markus Blöchl wrote:
> On Tue, Feb 15, 2022 at 08:17:31PM +0100, Christoph Hellwig wrote:
>> On Mon, Feb 14, 2022 at 10:51:07AM +0100, Markus Blöchl wrote:
>>> After the surprise removal of a mounted NVMe disk the pciehp task
>>> reliably hangs forever with a trace similar to this one:
>>
>> Do you have a specific reproducer? At least with doing a
>>
>> echo 1 > /sys/.../remove
>>
>> while running fsx on a file system I can't actually reproduce it.
>
> We built our own enclosures with a custom connector to plug the disks.
>
> So an external enclosure for thunderbolt is probably very similar.
> (or just ripping an unscrewed NVMe out of the M.2 ...)
>
> But as already suggested, qemu might also be very useful here as it also
> allows us to test multiple namespaces and multipath I/O, if you/someone
> wants to check those too (hotplug with multipath I/O really scares me).
>
Nothing to be scared of.
I've tested this extensively in the run up to commit 5396fdac56d8
("nvme: fix refcounting imbalance when all paths are down") which,
incidentally, is something you need if you want to test things.
Let me see if I can dig up the testbed.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
More information about the Linux-nvme
mailing list