regression: data corruption with ext4 on LUKS on nvme with torvalds master

Alex Xu (Hello71) alex_y_xu at yahoo.ca
Sat May 8 19:29:57 PDT 2021


Excerpts from Alex Xu (Hello71)'s message of May 8, 2021 1:54 pm:
> Hi all,
> 
> Using torvalds master, I recently encountered data corruption on my ext4 
> volume on LUKS on NVMe. Specifically, during heavy writes, the system 
> partially hangs; SysRq-W shows that processes are blocked in the kernel 
> on I/O. After forcibly rebooting, chunks of files are replaced with 
> other, unrelated data. I'm not sure exactly what the data is; some of it 
> is unknown binary data, but in at least one case, a list of file paths 
> was inserted into a file, indicating that the data is misdirected after 
> encryption.
> 
> This issue appears to affect files receiving writes in the temporal 
> vicinity of the hang, but affects both new and old data: for example, my 
> shell history file was corrupted up to many months before.
> 
> The drive reports no SMART issues.
> 
> I believe this is a regression in the kernel related to something merged 
> in the last few days, as it consistently occurs with my most recent 
> kernel versions, but disappears when reverting to an older kernel.
> 
> I haven't investigated further, such as by bisecting. I hope this is 
> sufficient information to give someone a lead on the issue, and if it is 
> a bug, nail it down before anybody else loses data.
> 
> Regards,
> Alex.
> 

I found the following test to reproduce a hang, which I guess may be the 
cause:

host$ cd /tmp
host$ truncate -s 10G drive
host$ qemu-system-x86_64 -drive format=raw,file=drive,if=none,id=drive -device nvme,drive=drive,serial=1 [... more VM setup options]
guest$ cryptsetup luksFormat /dev/nvme0n1
[accept warning, use any password]
guest$ cryptsetup open /dev/nvme0n1
[enter password]
guest$ mkfs.ext4 /dev/mapper/test
[normal output...]
Creating journal (16384 blocks): [hangs forever]

I bisected this issue to:

cd2c7545ae1beac3b6aae033c7f31193b3255946 is the first bad commit
commit cd2c7545ae1beac3b6aae033c7f31193b3255946
Author: Changheun Lee <nanich.lee at samsung.com>
Date:   Mon May 3 18:52:03 2021 +0900

    bio: limit bio max size

I didn't try reverting this commit or further reducing the test case. 
Let me know if you need my kernel config or other information.

Regards,
Alex.



More information about the Linux-nvme mailing list