Sequential read from NVMe/XFS twice slower on Fedora 42 than on Rocky 9.5

Mon May 5 06:21:19 PDT 2025

On Mon, 2025-05-05 at 08:29 -0400, Laurence Oberman wrote:
> On Mon, 2025-05-05 at 07:50 +1000, Dave Chinner wrote:
> > [cc linux-block]
> > 
> > [original bug report:
> > https://lore.kernel.org/linux-xfs/CAAiJnjoo0--yp47UKZhbu8sNSZN6DZ-QzmZBMmtr1oC=fOOgAQ@mail.gmail.com/
> >  ]
> > 
> > On Sun, May 04, 2025 at 10:22:58AM +0300, Anton Gavriliuk wrote:
> > > > What's the comparitive performance of an identical read profile
> > > > directly on the raw MD raid0 device?
> > > 
> > > Rocky 9.5 (5.14.0-503.40.1.el9_5.x86_64)
> > > 
> > > [root at localhost ~]# df -mh /mnt
> > > Filesystem      Size  Used Avail Use% Mounted on
> > > /dev/md127       35T  1.3T   34T   4% /mnt
> > > 
> > > [root at localhost ~]# fio --name=test --rw=read --bs=256k
> > > --filename=/dev/md127 --direct=1 --numjobs=1 --iodepth=64 --
> > > exitall
> > > --group_reporting --ioengine=libaio --runtime=30 --time_based
> > > test: (g=0): rw=read, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB,
> > > (T)
> > > 256KiB-256KiB, ioengine=libaio, iodepth=64
> > > fio-3.39-44-g19d9
> > > Starting 1 process
> > > Jobs: 1 (f=1): [R(1)][100.0%][r=81.4GiB/s][r=334k IOPS][eta
> > > 00m:00s]
> > > test: (groupid=0, jobs=1): err= 0: pid=43189: Sun May  4 08:22:12
> > > 2025
> > >   read: IOPS=363k, BW=88.5GiB/s (95.1GB/s)(2656GiB/30001msec)
> > >     slat (nsec): min=971, max=312380, avg=1817.92, stdev=1367.75
> > >     clat (usec): min=78, max=1351, avg=174.46, stdev=28.86
> > >      lat (usec): min=80, max=1352, avg=176.27, stdev=28.81
> > > 
> > > Fedora 42 (6.14.5-300.fc42.x86_64)
> > > 
> > > [root at localhost anton]# df -mh /mnt
> > > Filesystem      Size  Used Avail Use% Mounted on
> > > /dev/md127       35T  1.3T   34T   4% /mnt
> > > 
> > > [root at localhost ~]# fio --name=test --rw=read --bs=256k
> > > --filename=/dev/md127 --direct=1 --numjobs=1 --iodepth=64 --
> > > exitall
> > > --group_reporting --ioengine=libaio --runtime=30 --time_based
> > > test: (g=0): rw=read, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB,
> > > (T)
> > > 256KiB-256KiB, ioengine=libaio, iodepth=64
> > > fio-3.39-44-g19d9
> > > Starting 1 process
> > > Jobs: 1 (f=1): [R(1)][100.0%][r=41.0GiB/s][r=168k IOPS][eta
> > > 00m:00s]
> > > test: (groupid=0, jobs=1): err= 0: pid=5685: Sun May  4 10:14:00
> > > 2025
> > >   read: IOPS=168k, BW=41.0GiB/s (44.1GB/s)(1231GiB/30001msec)
> > >     slat (usec): min=3, max=273, avg= 5.63, stdev= 1.48
> > >     clat (usec): min=67, max=2800, avg=374.99, stdev=29.90
> > >      lat (usec): min=72, max=2914, avg=380.62, stdev=30.22
> > 
> > So the MD block device shows the same read performance as the
> > filesystem on top of it. That means this is a regression at the MD
> > device layer or in the block/driver layers below it. i.e. it is not
> > an XFS of filesystem issue at all.
> > 
> > -Dave.
> 
> I have a lab setup, let me see if I can also reproduce and then trace
> this to see where it is spending the time
> 

Not seeing 1/2 the bandwidth but also significantly slower on Fedora42
kernel.
I will trace it

9.5 kernel - 5.14.0-503.40.1.el9_5.x86_64

Run status group 0 (all jobs):
   READ: bw=14.7GiB/s (15.8GB/s), 14.7GiB/s-14.7GiB/s (15.8GB/s-
15.8GB/s), io=441GiB (473GB), run=30003-30003msec

Fedora42 kernel - 6.14.5-300.fc42.x86_64

Run status group 0 (all jobs):
   READ: bw=10.4GiB/s (11.2GB/s), 10.4GiB/s-10.4GiB/s (11.2GB/s-
11.2GB/s), io=313GiB (336GB), run=30001-30001msec