Data corruption when using multiple devices with NVMEoF TCP

Sagi Grimberg sagi at grimberg.me
Tue Jan 12 01:49:07 EST 2021


>>> Hey Hao,
>>>
>>>> Here is the entire log (and it's a new one, i.e. above snippet not
>>>> included):
>>>> https://drive.google.com/file/d/16ArIs5-Jw4P2f17A_ftKLm1A4LQUFpmg/view?usp=sharing
>>>>
>>>>
>>>> What I found is the data corruption does not always happen, especially
>>>> when I copy a small directory. So I guess a lot of log entries should
>>>> just look fine.
>>>
>>> So this seems to be a breakage that existed for some time now with
>>> multipage bvecs that you have been the first one to report. This
>>> seems to be related to bio merges, which is seems strange to me
>>> why this just now comes up, perhaps it is the combination with
>>> raid0 that triggers this, I'm not sure.
>>
>> OK, I think I understand what is going on. With multipage bvecs
>> bios can split in the middle of a bvec entry, and then merge
>> back with another bio.
> 
> IMO, bio split can be done in the middle of a bvec even though the bvec
> is single page. The split may just be triggered in case of raid over nvme-tcp,
> and I guess it might be triggered by device mapper too.

Yes, but I couldn't find a case where it cannot happen, but it only
triggered with mdraid. I'll wait for Hao to verify and send a formal
patch.



More information about the Linux-nvme mailing list