nvme nvme0: I/O 0 (I/O Cmd) QID 1 timeout, aborting, source drive corruption observed

J. Hart jfhart085 at gmail.com
Thu Dec 15 05:33:30 PST 2022


Here are the test results.  My apologies for the delay, as I was being 
rather careful.  The test also appears to have resulted in very serious 
filesystem damage on the system main drive which had to be dealt with.

These results were obtained as before, at run level 1 with all processes 
terminated except for the root shell under initd.  I then started 
syslogd and then started klogd.  I ran vim under the root shell, and a 
terminal from with vim.  The rsync was run from that terminal, and the 
condition monitored via top within another vim terminal.

The settings I enabled were as follows:

CONFIG_NVME_VERBOSE_ERRORS=y
CONFIG_INTEL_IOMMU=y

you also mentioned the following:
CONFIG_INTEL_IOMMU_DEFAULT

I didn't have that parameter, but I think you meant this one:
CONFIG_INTEL_IOMMU_DEFAULT_ON=y

The rsync invocation hung as before, and had to be forcibly terminated.
The system was forced down via the magic SysReq key sequence.

I have appended the log extracts here.  If these are unusable, I can 
send the extracts as attachments or upload them as seperate files.


dmesg log extract:

[Dec15 21:06] printk: udevd: 5 output lines suppressed due to ratelimiting
[Dec15 21:14] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data 
mode. Quota mode: disabled.
[Dec15 21:16] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #53085366: comm rsync: pblk 0 bad header/extent: invalid 
eh_entries - magic f30a, entries 17, max 4(4), depth 0(0)
[Dec15 21:17] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #5506059: comm rsync: pblk 0 bad header/extent: invalid eh_entries 
- magic f30a, entries 17, max 4(4), depth 0(0)
[  +8.145700] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #5506082: comm rsync: pblk 0 bad header/extent: invalid eh_entries 
- magic f30a, entries 17, max 4(4), depth 0(0)
[  +7.668302] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #41180419: comm rsync: pblk 0 bad header/extent: invalid 
eh_entries - magic f30a, entries 17, max 4(4), depth 0(0)
[  +0.081111] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #41180440: comm rsync: pblk 0 bad header/extent: invalid 
eh_entries - magic f30a, entries 17, max 4(4), depth 0(0)
[  +0.037526] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #41180229: comm rsync: pblk 0 bad header/extent: invalid 
eh_entries - magic f30a, entries 17, max 4(4), depth 0(0)
[  +0.011292] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #41182425: comm rsync: pblk 0 bad header/extent: invalid 
eh_entries - magic f30a, entries 17, max 4(4), depth 0(0)
[  +0.031661] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #41178319: comm rsync: pblk 0 bad header/extent: invalid 
eh_entries - magic f30a, entries 17, max 4(4), depth 0(0)
[Dec15 21:32] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #38936868: comm rsync: pblk 0 bad header/extent: invalid 
eh_entries - magic f30a, entries 17, max 4(4), depth 0(0)
[  +0.000629] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #38936901: comm rsync: pblk 0 bad header/extent: invalid 
eh_entries - magic f30a, entries 17, max 4(4), depth 0(0)
[Dec15 21:34] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #41029460: comm rsync: pblk 0 bad header/extent: invalid 
eh_entries - magic f30a, entries 17, max 4(4), depth 0(0)
[ +22.029135] EXT4-fs error (device dm-0): ext4_ext_check_inode:520: 
inode #41029305: comm rsync: pblk 0 bad header/extent: invalid 
eh_entries - magic f30a, entries 17, max 4(4), depth 0(0)
[ +26.890018] nvme nvme0: I/O 0 (Write) QID 1 timeout, aborting
[Dec15 21:35] nvme nvme0: I/O 0 QID 1 timeout, reset controller
[ +30.719998] nvme nvme0: I/O 13 QID 0 timeout, reset controller
[Dec15 21:38] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[  +0.014796] nvme nvme0: Abort status: 0x371
[Dec15 21:40] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[  +0.000024] nvme nvme0: Removing after probe failure status: -19
[Dec15 21:42] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[  +0.000324] nvme0n1: detected capacity change from 976773168 to 0
[  +0.000010] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O 
error 10 writing to inode 6558258 starting block 27121876)
[  +0.000046] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O 
error 10 writing to inode 6558302 starting block 27335746)
[  +0.000061] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O 
error 10 writing to inode 6558257 starting block 27199501)
[  +0.000012] Buffer I/O error on device nvme0n1p3, logical block 27104980
[  +0.000006] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O 
error 10 writing to inode 6558301 starting block 26915262)
[  +0.000010] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O 
error 10 writing to inode 6558287 starting block 27335718)
[  +0.000006] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O 
error 10 writing to inode 6558288 starting block 27335719)
[  +0.000003] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O 
error 10 writing to inode 6558289 starting block 27335721)
[  +0.000005] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O 
error 10 writing to inode 6558290 starting block 27335723)
[  +0.000006] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O 
error 10 writing to inode 6558291 starting block 27335724)
[  +0.000017] Buffer I/O error on device nvme0n1p3, logical block 27104981
[  +0.000006] EXT4-fs warning (device nvme0n1p3): ext4_end_bio:343: I/O 
error 10 writing to inode 6558257 starting block 27198798)
[  +0.000193] Buffer I/O error on device nvme0n1p3, logical block 27104982
[  +0.000018] Buffer I/O error on device nvme0n1p3, logical block 27104983
[  +0.000019] Buffer I/O error on device nvme0n1p3, logical block 27104984
[  +0.000031] Buffer I/O error on device nvme0n1p3, logical block 27318850
[  +0.000028] Buffer I/O error on device nvme0n1p3, logical block 26898366
[  +0.000020] Buffer I/O error on device nvme0n1p3, logical block 26898367
[  +0.000018] Buffer I/O error on device nvme0n1p3, logical block 26898368
[  +0.000019] Buffer I/O error on device nvme0n1p3, logical block 26898369
[  +0.000092] Aborting journal on device nvme0n1p3-8.
[  +0.000057] Buffer I/O error on dev nvme0n1p3, logical block 60850176, 
lost sync page write
[  +0.000057] EXT4-fs error (device nvme0n1p3): 
ext4_journal_check_start:83: comm kworker/u8:2: Detected aborted journal
[  +0.000022] JBD2: I/O error when updating journal superblock for 
nvme0n1p3-8.
[  +0.000093] Buffer I/O error on dev nvme0n1p3, logical block 0, lost 
sync page write
[  +0.000029] EXT4-fs (nvme0n1p3): I/O error while writing superblock
[  +0.000017] EXT4-fs (nvme0n1p3): Remounting filesystem read-only
[  +0.000018] EXT4-fs (nvme0n1p3): ext4_writepages: jbd2_start: 12288 
pages, ino 6558303; err -30
[  +0.000377] Buffer I/O error on dev nvme0n1p3, logical block 26214412, 
lost async page write
[  +0.000030] Buffer I/O error on dev nvme0n1p3, logical block 26214705, 
lost async page write
[  +0.000023] Buffer I/O error on dev nvme0n1p3, logical block 26214706, 
lost async page write
[  +0.000026] Buffer I/O error on dev nvme0n1p3, logical block 26214707, 
lost async page write
[  +0.000024] Buffer I/O error on dev nvme0n1p3, logical block 26214708, 
lost async page write
[  +0.000021] EXT4-fs (nvme0n1p3): failed to convert unwritten extents 
to written extents -- potential data loss!  (inode 6558291, error -30)
[  +0.000011] Buffer I/O error on dev nvme0n1p3, logical block 26214709, 
lost async page write
[  +0.000026] EXT4-fs (nvme0n1p3): failed to convert unwritten extents 
to written extents -- potential data loss!  (inode 6558256, error -30)
[  +0.000018] Buffer I/O error on dev nvme0n1p3, logical block 26214710, 
lost async page write
[  +0.000047] Buffer I/O error on dev nvme0n1p3, logical block 26214711, 
lost async page write
[  +0.000305] EXT4-fs (nvme0n1p3): failed to convert unwritten extents 
to written extents -- potential data loss!  (inode 6558292, error -30)
[  +0.000044] EXT4-fs (nvme0n1p3): failed to convert unwritten extents 
to written extents -- potential data loss!  (inode 6558293, error -30)
[  +0.000036] EXT4-fs (nvme0n1p3): failed to convert unwritten extents 
to written extents -- potential data loss!  (inode 6558294, error -30)
[  +0.000049] EXT4-fs (nvme0n1p3): failed to convert unwritten extents 
to written extents -- potential data loss!  (inode 6558295, error -30)
[  +0.000047] EXT4-fs (nvme0n1p3): failed to convert unwritten extents 
to written extents -- potential data loss!  (inode 6558296, error -30)

=====================================================================

syslog extract:

Dec 15 21:08:13 DellXPS kernel: printk: udevd: 5 output lines suppressed 
due to ratelimiting
Dec 15 21:14:30 DellXPS kernel: EXT4-fs (nvme0n1p3): mounted filesystem 
with ordered data mode. Quota mode: disabled.
Dec 15 21:17:00 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #53085366: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:17:42 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #5506059: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:17:51 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #5506082: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:17:58 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #41180419: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:17:58 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #41180440: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:17:58 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #41180229: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:17:58 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #41182425: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:17:58 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #41178319: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:32:43 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #38936868: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:32:43 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #38936901: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:34:03 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #41029460: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:34:25 DellXPS kernel: EXT4-fs error (device dm-0): 
ext4_ext_check_inode:520: inode #41029305: comm rsync: pblk 0 bad 
header/extent: invalid eh_entries - magic f30a, entries 17, max 4(4), 
depth 0(0)
Dec 15 21:34:52 DellXPS kernel: nvme nvme0: I/O 0 (Write) QID 1 timeout, 
aborting
Dec 15 21:35:23 DellXPS kernel: nvme nvme0: I/O 0 QID 1 timeout, reset 
controller
Dec 15 21:35:54 DellXPS kernel: nvme nvme0: I/O 13 QID 0 timeout, reset 
controller
Dec 15 21:38:25 DellXPS kernel: nvme nvme0: Device not ready; aborting 
reset, CSTS=0x1
Dec 15 21:38:25 DellXPS kernel: nvme nvme0: Abort status: 0x371
Dec 15 21:40:25 DellXPS kernel: nvme nvme0: Device not ready; aborting 
reset, CSTS=0x1
Dec 15 21:40:25 DellXPS kernel: nvme nvme0: Removing after probe failure 
status: -19
Dec 15 21:42:26 DellXPS kernel: nvme nvme0: Device not ready; aborting 
reset, CSTS=0x1
Dec 15 21:42:26 DellXPS kernel: nvme0n1: detected capacity change from 
976773168 to 0
Dec 15 21:42:26 DellXPS kernel: EXT4-fs warning (device nvme0n1p3): 
ext4_end_bio:343: I/O error 10 writing to inode 6558258 starting block 
27121876)
Dec 15 21:42:26 DellXPS kernel: EXT4-fs warning (device nvme0n1p3): 
ext4_end_bio:343: I/O error 10 writing to inode 6558302 starting block 
27335746)
Dec 15 21:42:26 DellXPS kernel: EXT4-fs warning (device nvme0n1p3): 
ext4_end_bio:343: I/O error 10 writing to inode 6558257 starting block 
27199501)
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on device nvme0n1p3, 
logical block 27104980
Dec 15 21:42:26 DellXPS kernel: EXT4-fs warning (device nvme0n1p3): 
ext4_end_bio:343: I/O error 10 writing to inode 6558301 starting block 
26915262)
Dec 15 21:42:26 DellXPS kernel: EXT4-fs warning (device nvme0n1p3): 
ext4_end_bio:343: I/O error 10 writing to inode 6558287 starting block 
27335718)
Dec 15 21:42:26 DellXPS kernel: EXT4-fs warning (device nvme0n1p3): 
ext4_end_bio:343: I/O error 10 writing to inode 6558288 starting block 
27335719)
Dec 15 21:42:26 DellXPS kernel: EXT4-fs warning (device nvme0n1p3): 
ext4_end_bio:343: I/O error 10 writing to inode 6558289 starting block 
27335721)
Dec 15 21:42:26 DellXPS kernel: EXT4-fs warning (device nvme0n1p3): 
ext4_end_bio:343: I/O error 10 writing to inode 6558290 starting block 
27335723)
Dec 15 21:42:26 DellXPS kernel: EXT4-fs warning (device nvme0n1p3): 
ext4_end_bio:343: I/O error 10 writing to inode 6558291 starting block 
27335724)
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on device nvme0n1p3, 
logical block 27104981
Dec 15 21:42:26 DellXPS kernel: EXT4-fs warning (device nvme0n1p3): 
ext4_end_bio:343: I/O error 10 writing to inode 6558257 starting block 
27198798)
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on device nvme0n1p3, 
logical block 27104982
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on device nvme0n1p3, 
logical block 27104983
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on device nvme0n1p3, 
logical block 27104984
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on device nvme0n1p3, 
logical block 27318850
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on device nvme0n1p3, 
logical block 26898366
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on device nvme0n1p3, 
logical block 26898367
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on device nvme0n1p3, 
logical block 26898368
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on device nvme0n1p3, 
logical block 26898369
Dec 15 21:42:26 DellXPS kernel: Aborting journal on device nvme0n1p3-8.
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on dev nvme0n1p3, 
logical block 60850176, lost sync page write
Dec 15 21:42:26 DellXPS kernel: EXT4-fs error (device nvme0n1p3): 
ext4_journal_check_start:83: comm kworker/u8:2: Detected aborted journal
Dec 15 21:42:26 DellXPS kernel: JBD2: I/O error when updating journal 
superblock for nvme0n1p3-8.
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on dev nvme0n1p3, 
logical block 0, lost sync page write
Dec 15 21:42:26 DellXPS kernel: EXT4-fs (nvme0n1p3): I/O error while 
writing superblock
Dec 15 21:42:26 DellXPS kernel: EXT4-fs (nvme0n1p3): Remounting 
filesystem read-only
Dec 15 21:42:26 DellXPS kernel: EXT4-fs (nvme0n1p3): ext4_writepages: 
jbd2_start: 12288 pages, ino 6558303; err -30
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on dev nvme0n1p3, 
logical block 26214412, lost async page write
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on dev nvme0n1p3, 
logical block 26214705, lost async page write
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on dev nvme0n1p3, 
logical block 26214706, lost async page write
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on dev nvme0n1p3, 
logical block 26214707, lost async page write
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on dev nvme0n1p3, 
logical block 26214708, lost async page write
Dec 15 21:42:26 DellXPS kernel: EXT4-fs (nvme0n1p3): failed to convert 
unwritten extents to written extents -- potential data loss!  (inode 
6558291, error -30)
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on dev nvme0n1p3, 
logical block 26214709, lost async page write
Dec 15 21:42:26 DellXPS kernel: EXT4-fs (nvme0n1p3): failed to convert 
unwritten extents to written extents -- potential data loss!  (inode 
6558256, error -30)
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on dev nvme0n1p3, 
logical block 26214710, lost async page write
Dec 15 21:42:26 DellXPS kernel: Buffer I/O error on dev nvme0n1p3, 
logical block 26214711, lost async page write
Dec 15 21:42:26 DellXPS kernel: EXT4-fs (nvme0n1p3): failed to convert 
unwritten extents to written extents -- potential data loss!  (inode 
6558292, error -30)
Dec 15 21:42:26 DellXPS kernel: EXT4-fs (nvme0n1p3): failed to convert 
unwritten extents to written extents -- potential data loss!  (inode 
6558293, error -30)
Dec 15 21:42:26 DellXPS kernel: EXT4-fs (nvme0n1p3): failed to convert 
unwritten extents to written extents -- potential data loss!  (inode 
6558294, error -30)
Dec 15 21:42:26 DellXPS kernel: EXT4-fs (nvme0n1p3): failed to convert 
unwritten extents to written extents -- potential data loss!  (inode 
6558295, error -30)
Dec 15 21:42:26 DellXPS kernel: EXT4-fs (nvme0n1p3): failed to convert 
unwritten extents to written extents -- potential data loss!  (inode 
6558296, error -30)


On 12/15/22 6:09 PM, Christoph Hellwig wrote:
> On Thu, Dec 15, 2022 at 06:07:32PM +0900, J. Hart wrote:
>> My apologies....I am not sure what is meant by "stay DMAs".  Is there
>> something I can look for here ?
> 
> I mean stray, sorry.  There isn't really anything you can look for.
> Either it really is a device problem, in which case the IOMMU should
> catch it.  Or it is a kernel problem somewhere, in which case
> CONFIG_KASAN would catch.  So maybe enable that as well, but it will
> slow down the kernel a LOT.

J. Hart



More information about the Linux-nvme mailing list