Data corruption when using multiple devices with NVMEoF TCP

Sagi Grimberg sagi at grimberg.me
Wed Dec 23 03:41:30 EST 2020


> Also really strange to me. This has been burning me 16+ hours a day
> for 2 days doing
> 
> And for your question, yes I did.
> Locally on the target side, no data corruption happening, with the
> same process of creating a partition on each device, creating a
> 2-device raid-0 volume, and creating a filesystem.
> I have also tested on multiple sets of machines, but no luck.
> 
> Another point I should've mentioned is that corruption does not always
> happen. Sometimes if I only copy one .gz file (~100MB), it seems fine.
> But whenever I copy a large directory with many .gz files (~100GB in
> total), there are always some .gz files corrupted.

OK, interesting.

Can you retry the test with setting max_sectors_kb to 512:
echo 512 > /sys/block/nvmeXnY/queue/max_sectors_kb

I'm trying to understand if there is an issue related
to large IOs.



More information about the Linux-nvme mailing list