nvme-pci: Disabling device after reset failure: -5 occurs while AER recovery
Keith Busch
kbusch at kernel.org
Sat Mar 11 08:46:03 PST 2023
On Sat, Mar 11, 2023 at 09:22:20AM +0100, Lukas Wunner wrote:
> > - Switch and NVMe MPS are 512B
> > - NVMe config space saved (including MPS=512B)
> > - You change Switch MPS to 128B
> > - NVMe does DMA with payload > 128B
> > - Switch reports Malformed TLP because TLP is larger than its MPS
> > - Recovery resets NVMe, which sets MPS to the default of 128B
> > - nvme_slot_reset() restores NVMe config space (MPS is now 512B)
> > - Subsequent NVMe DMA with payload > 128B repeats cycle
>
> Forgive my ignorance, but if MPS is restored to 512B by nvme_slot_reset(),
> shouldn't the communication with the device just work again from that
> point on?
The upstream port was tuned down 128 without coordinating with the kernel, so
restoring the nvme to 512 creates the mismatch.
More information about the Linux-nvme
mailing list