IRQ/nvme_pci_complete_rq: NULL pointer dereference yet again

Keith Busch keith.busch at intel.com
Fri Apr 6 11:04:45 PDT 2018


On Fri, Apr 06, 2018 at 12:46:06PM -0500, Alex G. wrote:
> On 04/06/2018 12:16 PM, Scott Bauer wrote:
> > You're using AER inject, right?
> 
> No. I'm causing the errors in hardware with hot-unplug.

I think Scott's still on the right track for this particular sighting.
The AER handler looks unsafe under changing topologies. It might need run
under pci_lock_rescan_remove() before walking the bus to prevent races
with the surprise removal, but it's not clear to me yet if holding that
lock is okay to do in this context.

This however does not appear to resemble your previous sightings. In your
previous sightings, it looks like something has lost track of commands,
and we're freeing the resources with them a second time.



More information about the Linux-nvme mailing list