nvme-pci: Disabling device after reset failure: -5 occurs while AER recovery

Keith Busch kbusch at kernel.org
Sat Mar 11 08:46:03 PST 2023


On Sat, Mar 11, 2023 at 09:22:20AM +0100, Lukas Wunner wrote:
> >   - Switch and NVMe MPS are 512B
> >   - NVMe config space saved (including MPS=512B)
> >   - You change Switch MPS to 128B
> >   - NVMe does DMA with payload > 128B
> >   - Switch reports Malformed TLP because TLP is larger than its MPS
> >   - Recovery resets NVMe, which sets MPS to the default of 128B
> >   - nvme_slot_reset() restores NVMe config space (MPS is now 512B)
> >   - Subsequent NVMe DMA with payload > 128B repeats cycle
> 
> Forgive my ignorance, but if MPS is restored to 512B by nvme_slot_reset(),
> shouldn't the communication with the device just work again from that
> point on?

The upstream port was tuned down 128 without coordinating with the kernel, so
restoring the nvme to 512 creates the mismatch.



More information about the Linux-nvme mailing list