[PATCH] nvme-pci: fix stuck reset on concurrent DPC and HP
Keith Busch
kbusch at kernel.org
Mon Mar 10 07:38:42 PDT 2025
On Sat, Mar 08, 2025 at 12:57:50PM +0530, Nilay Shroff wrote:
> On 3/7/25 8:54 PM, Keith Busch wrote:
> > If the pciehp removal starts after reset work's controller
> > initialization, then nothing stops the work from sending new admin
> > commands, and nothing will complete them. This causes the error_resume
> > to wait for something that will never happen.
>
> Ok makes sense, this appears to be a tight race condition and may not be
> limited to one platform. This should be possible even on PPC.
Yeah, I also thought it was a very unlikely condition to hit, but I have
a platform that deadlocks here almost every time a DPC event occurs.
More information about the Linux-nvme
mailing list