nvme nvme0: I/O 0 (I/O Cmd) QID 1 timeout, aborting, source drive corruption observed

Christoph Hellwig hch at lst.de
Tue Dec 20 23:50:57 PST 2022


On Tue, Dec 20, 2022 at 09:56:23AM -0700, Keith Busch wrote:
> Though I am skeptical, Christoph seemed to also think there was a
> possibility you hit a real kernel issue with your setup, but I don't
> know if he has any ideas other than enabling KASAN to see if that
> catches anything.

Sorry for the delay, caught the nasy cold bugs circulating everywhere
and was mostly knocked out for a couple of days.

I can't really think of anything specific, but when we see random
memory corruption, there's basically two major options:

 - something DMAing where it should not.  In general an IOMMU should
   catch that if it is actually enable.  I think Keith rightly questioned
   if VT-d is actually running here and not disabled by the BIOS, and
   I don't remember a dmesg disproving that.  Even with that there
   could be some devices opting out of the IOMMU in the BIOS
 - the kernel overwriting random data.  This should be really rare, but
   could happen and KASAN should catch it.  But I really have no idea
   what it would be.



More information about the Linux-nvme mailing list