nvme nvme0: I/O 0 (I/O Cmd) QID 1 timeout, aborting, source drive corruption observed
Christoph Hellwig
hch at lst.de
Tue Dec 20 23:50:57 PST 2022
On Tue, Dec 20, 2022 at 09:56:23AM -0700, Keith Busch wrote:
> Though I am skeptical, Christoph seemed to also think there was a
> possibility you hit a real kernel issue with your setup, but I don't
> know if he has any ideas other than enabling KASAN to see if that
> catches anything.
Sorry for the delay, caught the nasy cold bugs circulating everywhere
and was mostly knocked out for a couple of days.
I can't really think of anything specific, but when we see random
memory corruption, there's basically two major options:
- something DMAing where it should not. In general an IOMMU should
catch that if it is actually enable. I think Keith rightly questioned
if VT-d is actually running here and not disabled by the BIOS, and
I don't remember a dmesg disproving that. Even with that there
could be some devices opting out of the IOMMU in the BIOS
- the kernel overwriting random data. This should be really rare, but
could happen and KASAN should catch it. But I really have no idea
what it would be.
More information about the Linux-nvme
mailing list