[RFC PATCH] nvme: prevent hang on surprise removal of NVMe disk

Hannes Reinecke hare at suse.de
Wed Feb 16 03:32:37 PST 2022


On 2/16/22 12:18, Markus Blöchl wrote:
> On Tue, Feb 15, 2022 at 08:17:31PM +0100, Christoph Hellwig wrote:
>> On Mon, Feb 14, 2022 at 10:51:07AM +0100, Markus Blöchl wrote:
>>> After the surprise removal of a mounted NVMe disk the pciehp task
>>> reliably hangs forever with a trace similar to this one:
>>
>> Do you have a specific reproducer? At least with doing a
>>
>> echo 1 > /sys/.../remove
>>
>> while running fsx on a file system I can't actually reproduce it.
> 
> We built our own enclosures with a custom connector to plug the disks.
> 
> So an external enclosure for thunderbolt is probably very similar.
> (or just ripping an unscrewed NVMe out of the M.2 ...)
> 
> But as already suggested, qemu might also be very useful here as it also
> allows us to test multiple namespaces and multipath I/O, if you/someone
> wants to check those too (hotplug with multipath I/O really scares me).
> 
Nothing to be scared of.
I've tested this extensively in the run up to commit 5396fdac56d8 
("nvme: fix refcounting imbalance when all paths are down") which,
incidentally, is something you need if you want to test things.

Let me see if I can dig up the testbed.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare at suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer



More information about the Linux-nvme mailing list