nvme nvme0: I/O 0 (I/O Cmd) QID 1 timeout, aborting, source drive corruption observed

Christoph Hellwig hch at lst.de
Thu Dec 15 00:23:44 PST 2022


On Thu, Dec 15, 2022 at 10:38:33AM +0900, J. Hart wrote:
> I am attempting to load an nvme device (nvme0n1) to use as main system 
> drive using the following command:
>
> rsync -axvH /. --exclude=/lost+found --exclude=/var/log.bu 
> --exclude=/usr/var/log.bu --exclude=/usr/X11R6/var/log.bu 
> --exclude=/home/jhart/.cache/mozilla/firefox/are7uokl.default-release/cache2.bu 
> --exclude=/home/jhart/.cache/thunderbird/7zsnqnss.default/cache2.bu 
> /mnt/root_new 2>&1 | tee root.log
>
> The total transfer would be approximately 50 GB.  This is being done at run 
> level 1, and only the kernel threads and the root shell are observed to be 
> active.
>
> The following log messages appear after a minute or so, and rsync hangs. 
> The nvme drive cannot be unmounted without a reboot.

Ok, this looks like the driver has firmware / hardware problems and
can't copy wit hthe load.

>
> dmesg reports the following:

nvme0 is the destination driver I guess?

>
> [Dec14 19:24] nvme nvme0: I/O 0 (I/O Cmd) QID 1 timeout, aborting

Can you enable CONFIG_NVME_VERBOSE_ERRORS so that we can see what
commands are hanging?

> I have also observed file system corruption on the source drive of the 
> transfer.  I would not normally think this to be related, except that after 
> the first time I observed it, I made certain that I corrected the file 
> content before any additional attempts, but have seen this again after 
> every attempt.  The modification dates and file sizes did not change, but 
> the file content on the source drive did.  I confirmed this using the 
> "diff" utility, and again using a rsync dry run with the check sum test 
> enabled.

Ok, that's really odd.  The only way I could think of that happening
is if the driver does stay DMAs, which would be really grave.

Do you have CONFIG_INTEL_IOMMU and CONFIG_INTEL_IOMMU_DEFAULT_ON enabled?
If not, it would be good to enable those to see if the iommu catches
any stray DMAs.



More information about the Linux-nvme mailing list