Data corruption when using multiple devices with NVMEoF TCP

Ming Lei tom.leiming at gmail.com
Mon Jan 11 21:22:05 EST 2021


On Tue, Jan 12, 2021 at 9:33 AM Sagi Grimberg <sagi at grimberg.me> wrote:
>
>
> > Hey Hao,
> >
> >> Here is the entire log (and it's a new one, i.e. above snippet not
> >> included):
> >> https://drive.google.com/file/d/16ArIs5-Jw4P2f17A_ftKLm1A4LQUFpmg/view?usp=sharing
> >>
> >>
> >> What I found is the data corruption does not always happen, especially
> >> when I copy a small directory. So I guess a lot of log entries should
> >> just look fine.
> >
> > So this seems to be a breakage that existed for some time now with
> > multipage bvecs that you have been the first one to report. This
> > seems to be related to bio merges, which is seems strange to me
> > why this just now comes up, perhaps it is the combination with
> > raid0 that triggers this, I'm not sure.
>
> OK, I think I understand what is going on. With multipage bvecs
> bios can split in the middle of a bvec entry, and then merge
> back with another bio.

IMO, bio split can be done in the middle of a bvec even though the bvec
is single page. The split may just be triggered in case of raid over nvme-tcp,
and I guess it might be triggered by device mapper too.


Thanks,
Ming



More information about the Linux-nvme mailing list