[LSF/MM?BFP TOPIC] Block-layer device resets
Damien Le Moal
dlemoal at kernel.org
Sun Feb 1 17:46:18 PST 2026
On 2/2/26 02:06, Hannes Reinecke wrote:
> Hi all,
>
> We are currently working on implementing cross-controller resets for
> NVMe, which requires to send a command to the target which then should
> terminate all commands on a given controller.
> While we could easily terminate the controller, the specification
> also requires us to terminate all outstanding commands.
> Which then recurses into my all-time favourite topic on how to
> abort outstanding commands from the fs/bio layer.
>
> However, here we don't have to dissect/match to individual commands,
> but rather have to abort everything, which seems rather easier.s
>
> So I would like to fathom whether such a thing is feasible/reasonable
> (I think so, obviously, and can think of several other use-cases, too,
> qemu springs to mind here ...) and discuss possible implementations
> (set 'req->deadline' to zero for all pending commands?).
> Or maybe we can do such a thing already and I'm just not aware of it...
Hmmm... Command timeouts ? E.g. if a controller is slow to respond (send
completions), the block layer timeout timer may trigger, which will call into
the low level device driver to force a reset. But before the reset actually
happens, completions may actually come back, and we do handle that race
correctly, well at least for scsi/ata.
Your scenario sound very similar to this: once you reset the controller,
whatever was pending will be silent and can be aborted or retried. So it does
sound like that should not be too difficult, no ? Generalize the timeout
processing or do something similar ?
--
Damien Le Moal
Western Digital Research
More information about the Linux-nvme
mailing list