Sequential read from NVMe/XFS twice slower on Fedora 42 than on Rocky 9.5

Mon May 5 05:29:21 PDT 2025

On Mon, 2025-05-05 at 07:50 +1000, Dave Chinner wrote:
> [cc linux-block]
> 
> [original bug report:
> https://lore.kernel.org/linux-xfs/CAAiJnjoo0--yp47UKZhbu8sNSZN6DZ-QzmZBMmtr1oC=fOOgAQ@mail.gmail.com/
>  ]
> 
> On Sun, May 04, 2025 at 10:22:58AM +0300, Anton Gavriliuk wrote:
> > > What's the comparitive performance of an identical read profile
> > > directly on the raw MD raid0 device?
> > 
> > Rocky 9.5 (5.14.0-503.40.1.el9_5.x86_64)
> > 
> > [root at localhost ~]# df -mh /mnt
> > Filesystem      Size  Used Avail Use% Mounted on
> > /dev/md127       35T  1.3T   34T   4% /mnt
> > 
> > [root at localhost ~]# fio --name=test --rw=read --bs=256k
> > --filename=/dev/md127 --direct=1 --numjobs=1 --iodepth=64 --exitall
> > --group_reporting --ioengine=libaio --runtime=30 --time_based
> > test: (g=0): rw=read, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T)
> > 256KiB-256KiB, ioengine=libaio, iodepth=64
> > fio-3.39-44-g19d9
> > Starting 1 process
> > Jobs: 1 (f=1): [R(1)][100.0%][r=81.4GiB/s][r=334k IOPS][eta
> > 00m:00s]
> > test: (groupid=0, jobs=1): err= 0: pid=43189: Sun May  4 08:22:12
> > 2025
> >   read: IOPS=363k, BW=88.5GiB/s (95.1GB/s)(2656GiB/30001msec)
> >     slat (nsec): min=971, max=312380, avg=1817.92, stdev=1367.75
> >     clat (usec): min=78, max=1351, avg=174.46, stdev=28.86
> >      lat (usec): min=80, max=1352, avg=176.27, stdev=28.81
> > 
> > Fedora 42 (6.14.5-300.fc42.x86_64)
> > 
> > [root at localhost anton]# df -mh /mnt
> > Filesystem      Size  Used Avail Use% Mounted on
> > /dev/md127       35T  1.3T   34T   4% /mnt
> > 
> > [root at localhost ~]# fio --name=test --rw=read --bs=256k
> > --filename=/dev/md127 --direct=1 --numjobs=1 --iodepth=64 --exitall
> > --group_reporting --ioengine=libaio --runtime=30 --time_based
> > test: (g=0): rw=read, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T)
> > 256KiB-256KiB, ioengine=libaio, iodepth=64
> > fio-3.39-44-g19d9
> > Starting 1 process
> > Jobs: 1 (f=1): [R(1)][100.0%][r=41.0GiB/s][r=168k IOPS][eta
> > 00m:00s]
> > test: (groupid=0, jobs=1): err= 0: pid=5685: Sun May  4 10:14:00
> > 2025
> >   read: IOPS=168k, BW=41.0GiB/s (44.1GB/s)(1231GiB/30001msec)
> >     slat (usec): min=3, max=273, avg= 5.63, stdev= 1.48
> >     clat (usec): min=67, max=2800, avg=374.99, stdev=29.90
> >      lat (usec): min=72, max=2914, avg=380.62, stdev=30.22
> 
> So the MD block device shows the same read performance as the
> filesystem on top of it. That means this is a regression at the MD
> device layer or in the block/driver layers below it. i.e. it is not
> an XFS of filesystem issue at all.
> 
> -Dave.

I have a lab setup, let me see if I can also reproduce and then trace
this to see where it is spending the time