[LSF/MM?BFP TOPIC] Block-layer device resets

Damien Le Moal dlemoal at kernel.org
Sun Feb 1 17:46:18 PST 2026


On 2/2/26 02:06, Hannes Reinecke wrote:
> Hi all,
> 
> We are currently working on implementing cross-controller resets for
> NVMe, which requires to send a command to the target which then should
> terminate all commands on a given controller.
> While we could easily terminate the controller, the specification
> also requires us to terminate all outstanding commands.
> Which then recurses into my all-time favourite topic on how to
> abort outstanding commands from the fs/bio layer.
> 
> However, here we don't have to dissect/match to individual commands,
> but rather have to abort everything, which seems rather easier.s
> 
> So I would like to fathom whether such a thing is feasible/reasonable
> (I think so, obviously, and can think of several other use-cases, too,
> qemu springs to mind here ...) and discuss possible implementations
> (set 'req->deadline' to zero for all pending commands?).
> Or maybe we can do such a thing already and I'm just not aware of it...

Hmmm... Command timeouts ? E.g. if a controller is slow to respond (send
completions), the block layer timeout timer may trigger, which will call into
the low level device driver to force a reset. But before the reset actually
happens, completions may actually come back, and we do handle that race
correctly, well at least for scsi/ata.

Your scenario sound very similar to this: once you reset the controller,
whatever was pending will be silent and can be aborted or retried. So it does
sound like that should not be too difficult, no ? Generalize the timeout
processing or do something similar ?


-- 
Damien Le Moal
Western Digital Research



More information about the Linux-nvme mailing list