Data corruption when using multiple devices with NVMEoF TCP

Sagi Grimberg sagi at grimberg.me
Thu Dec 24 12:56:17 EST 2020


> Sagi, thanks a lot for helping look into this.
> 
>> Question, if you build the raid0 in the target and expose that over nvmet-tcp (with a single namespace), does the issue happen?
> No, it works fine in that case.
> Actually with this setup, initially the latency was pretty bad, and it
> seems enabling CONFIG_NVME_MULTIPATH improved it significantly.
> I'm not exactly sure though as I've changed too many things and didn't
> specifically test for this setup.
> Could you help confirm that?
> 
> And after applying your patch,
>   - With the problematic setup, i.e. creating a 2-device raid0, I did
> see numerous numerous prints popping up in dmesg; a few lines are
> pasted below:
>   - With the good setup, i.e. only using 1 device, this line also pops
> up, but a lot less frequent.

Hao, question, what is the io-scheduler in-use for the nvme-tcp devices?

Can you try to reproduce this issue when disabling merges on the
nvme-tcp devices?

echo 2 > /sys/block/nvmeXnY/queue/nomerges

I want to see if this is an issue with merged bios.



More information about the Linux-nvme mailing list