Performance Regression due to ASPM disable patch
Anuj Gupta
anuj20.g at samsung.com
Thu Jul 13 05:49:14 PDT 2023
On Thu, Jul 13, 2023 at 07:59:32AM +0200, Heiner Kallweit wrote:
> On 12.07.2023 17:55, Anuj Gupta wrote:
> > Hi,
> >
> > I see a performance regression for read/write workloads on our NVMe over
> > fabrics using TCP as transport setup.
> > IOPS drop by 23% for 4k-randread [1] and by 18% for 4k-randwrite [2].
> >
> > I bisected and found that the commit
> > e1ed3e4d91112027b90c7ee61479141b3f948e6a ("r8169: disable ASPM during
> > NAPI poll") is the trigger.
> > When I revert this commit, the performance drop goes away.
> >
> > The target machine uses a realtek ethernet controller -
> > root at testpc:/home/test# lspci | grep -i eth
> > 29:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 2600
> > (rev 21)
> > 2a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Killer
> > E3000 2.5GbE Controller (rev 03)
> >
> > I tried to disable aspm by passing "pcie_aspm=off" as boot parameter and
> > by setting pcie aspm policy to performance. But it didn't improve the
> > performance.
> > I wonder if this is already known, and something different should be
> > done to handle the original issue?
> >
> > [1] fio randread
> > fio -direct=1 -iodepth=1 -rw=randread -ioengine=psync -bs=4k -numjobs=1
> > -runtime=30 -group_reporting -filename=/dev/nvme1n1 -name=psync_read
> > -output=psync_read
> > [2] fio randwrite
> > fio -direct=1 -iodepth=1 -rw=randwrite -ioengine=psync -bs=4k -numjobs=1
> > -runtime=30 -group_reporting -filename=/dev/nvme1n1 -name=psync_read
> > -output=psync_write
> >
> >
> I can imagine a certain performance impact of this commit if there are
> lots of small packets handled by individual NAPI polls.
> Maybe it's also chip version specific.
> You have two NIC's, do you see the issue with both of them?
I see this issue with the Realtek Semiconductor Co., Ltd. Killer NIC.
I haven't used the other NIC.
> Related: What's your line speed, 1Gbps or 2.5Gbps?
Speed is 1000Mb/s [1].
> Can you reproduce the performance impact with iperf?
I was not able to reproduce it with iperf [2]. One of the reasons could
be that, currently performance drop happends in nvme over fabrics scenario,
where block IO processing takes sometime before sending next I/O and hence
network packets. I suspect iperf works by sending packets continuously,
rather than at intervals, let me know If I am missing something here.
> Do you use any network optimization settings for latency vs. performance?
No, I haven't set any network optimization settings. We are using
default Ubuntu values. If you suspect some particular setting, I can check.
> Interrupt coalescing, is TSO(6) enabled?
I tried this command on different PC containing the same realtek NIC and
a intel NIC. The command worked fine for the intel NIC, but failed for the
realtek nic. It seems that, the error is specific to realtek nic.
Is there some other way to check for Interrupt coalescing?
> An ethtool -k output may provide further insight.
Please see [3].
[1]
# ethtool enp42s0
Settings for enp42s0:
Speed: 1000Mb/s
[2]
WITH ASPM patch :
------------------------------------------------------------
# iperf -c 107.99.41.147 -l 4096 -i 1 -t 10
------------------------------------------------------------
Client connecting to 107.99.41.147, TCP port 5001
TCP window size: 531 KByte (default)
------------------------------------------------------------
[ 3] local 107.99.41.244 port 40340 connected with 107.99.41.147 port
5001
[ 3] 0.0-10.0 sec 1.10 GBytes 942 Mbits/sec
-----------------------------------------------------------
WITHOUT ASPM patch :
------------------------------------------------------------
# iperf -c 107.99.41.147 -l 4096 -i 1 -t 10
------------------------------------------------------------
Client connecting to 107.99.41.147, TCP port 5001
TCP window size: 472 KByte (default)
------------------------------------------------------------
[ 3] local 107.99.41.244 port 51766 connected with 107.99.41.147 port
5001
[ 3] 0.0-10.0 sec 1.10 GBytes 942 Mbits/sec
[3]
# ethtool -k enp42s0
Features for enp42s0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: off
tx-scatter-gather: off
tx-scatter-gather-fraglist:
off [fixed]
tcp-segmentation-offload:
off
tx-tcp-segmentation:
off
tx-tcp-ecn-segmentation:
off
[fixed]
tx-tcp-mangleid-segmentation:
off
tx-tcp6-segmentation:
off
generic-segmentation-offload:
off
[requested
on]
generic-receive-offload:
on
large-receive-offload:
off
[fixed]
rx-vlan-offload:
on
tx-vlan-offload:
on
ntuple-filters:
off
[fixed]
receive-hashing:
off
[fixed]
highdma:
on
[fixed]
rx-vlan-filter:
off
[fixed]
vlan-challenged:
off
[fixed]
tx-lockless:
off
[fixed]
netns-local:
off
[fixed]
tx-gso-robust:
off
[fixed]
tx-fcoe-segmentation:
off
[fixed]
tx-gre-segmentation:
off
[fixed]
tx-gre-csum-segmentation:
off
[fixed]
tx-ipxip4-segmentation:
off
[fixed]
tx-ipxip6-segmentation:
off
[fixed]
tx-udp_tnl-segmentation:
off
[fixed]
tx-udp_tnl-csum-segmentation:
off
[fixed]
tx-gso-partial:
off
[fixed]
tx-tunnel-remcsum-segmentation:
off
[fixed]
tx-sctp-segmentation:
off
[fixed]
tx-esp-segmentation:
off
[fixed]
tx-udp-segmentation:
off
[fixed]
tx-gso-list:
off
[fixed]
fcoe-mtu:
off
[fixed]
tx-nocache-copy:
off
loopback:
off
[fixed]
rx-fcs:
off
rx-all:
off
tx-vlan-stag-hw-insert:
off
[fixed]
rx-vlan-stag-hw-parse:
off
[fixed]
rx-vlan-stag-filter:
off
[fixed]
l2-fwd-offload:
off
[fixed]
hw-tc-offload:
off
[fixed]
esp-hw-offload:
off
[fixed]
esp-tx-csum-hw-offload:
off
[fixed]
rx-udp_tunnel-port-offload:
off
[fixed]
tls-hw-tx-offload:
off
[fixed]
tls-hw-rx-offload:
off
[fixed]
rx-gro-hw:
off
[fixed]
tls-hw-record:
off
[fixed]
rx-gro-list:
off
macsec-hw-offload:
off
[fixed]
rx-udp-gro-forwarding:
off
hsr-tag-ins-offload:
off
[fixed]
hsr-tag-rm-offload:
off
[fixed]
hsr-fwd-offload:
off
[fixed]
hsr-dup-offload:
off
[fixed]
>
>
More information about the Linux-nvme
mailing list