nvmeof Issues with Zen 3/Ryzen 5000 Initiator
Jonathan Wright
jonathan at knownhost.com
Wed May 26 13:47:05 PDT 2021
I've been testing NVMe over Fabrics for the past few weeks and the
performance has been nothing short of incredible, though I'm running
into some major issues that seems to be specifically related to AMD Zen
3 Ryzen chips (in my case I'm testing with 5900x).
Target:
Supermicro X10 board
Xeon E5-2620v4
Intel E810 NIC
Problematic Client/initiator:
ASRock X570 board
Ryzen 9 5900x
Intel E810 NIC
Stable Client/initiator:
Supermicro X10 board
Xeon E5-2620v4
Intel E810 NIC
I'm using the same 2 E810 NICs and pair of 25G DACs in both cases. The
NICs are directly connected with the DACs and there is no switch in the
equation. To trigger the issue I'm simply using FIO similar to this:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
--name=test --filename=/dev/nvme0n1 --bs=4k --iodepth=64 --size=10G
--readwrite=randread --time_based --runtime=1200
I'm primarily using RDMA/iWARP right now but I've also tested RoCE2
which presents the same issues/symptoms. Primary testing has been done
with Ubuntu 20.04.2 with CentOS 8 in the mix as well just to try and
rule out a weird distro-specific issue. All tests used the latest
ice/irdma drivers from Intel (1.5.8 and 1.5.2 respectively)
I've not yet tested a Ryzen 5900x target with an Intel initiator but i
plan to to see if it exhibits the same instability.
The issue presents itself as a connectivity loss between the two hosts -
but there is no connectivity issue. The issue is also somewhat
inconsistent. Sometimes it will show up after 1-2 minutes of testing,
sometimes instantly, and sometimes close to 10 minutes in.
Target dmesg sample:
[ 3867.598007] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[ 3867.598384] nvmet: ctrl 1 fatal error occurred!
Initiator dmesg sample:
<snip>
[ 348.122160] nvme nvme4: I/O 86 QID 17 timeout
[ 348.122224] nvme nvme4: I/O 87 QID 17 timeout
[ 348.122290] nvme nvme4: I/O 88 QID 17 timeout
[ 348.122354] nvme nvme4: I/O 89 QID 17 timeout
[ 348.122417] nvme nvme4: I/O 90 QID 17 timeout
[ 348.122480] nvme nvme4: I/O 91 QID 17 timeout
[ 348.122544] nvme nvme4: I/O 92 QID 17 timeout
[ 348.122607] nvme nvme4: I/O 93 QID 17 timeout
[ 348.122670] nvme nvme4: I/O 94 QID 17 timeout
[ 348.122733] nvme nvme4: I/O 95 QID 17 timeout
[ 348.122796] nvme nvme4: I/O 96 QID 17 timeout
<snip>
[ 380.387212] nvme nvme4: creating 24 I/O queues.
[ 380.573925] nvme nvme4: Successfully reconnected (1 attempts)
All the while the underlying connectivity is working just fine. There's
a long delay between the timeout and the successful reconnect. I
haven't timed it but it seems like about 5 minutes. This has luckily
given me plenty of time to test connectivity which has consistently been
just fine on all fronts.
I'm testing with a single Micron 9300 Pro 7.68TB right now which can
push about 850k read IOPs. On the Intel target/initiator combo I can
run it "balls to the walls" for hours on end with 0 issues. On the AMD
initiator I can trigger the disconnect/drop generally within 5 minutes.
Here's where things get weird - if I limit the test to 200K IOPs or less
then it's relatively stable on the AMD and I've not seen any drops when
this limitation is in place.
Here are some things I've tried which make no difference (or make things
worse):
Ubuntu 20.04.2 kernel 5.4.
Ubuntu 20.04.2 kernel 5.8
Ubuntu 20.04.2 kernel 5.10
CentOS 8 kernel 4.18
CentOS 8 kernel 5.10 (from elrepo)
CentOS 8 kernel 5.12 (from elrepo) - whole system actually freezes upon
"nvme connect" command on this one
With and without multipath (native)
With and without round-robin on multipath (native)
Different NVMe drive models
With and without PFC
10G DAC
25G DAC
25G DAC negotiated at 10G
With and without a switch
iWARP and RoCE2
I did do some testing with TCP/IP but cannot reach the >200k IOPS
threshold with it which seems to be important for triggering the issue.
I did not experience the drops with TCP/IP.
I can't seem to draw any conclusion other than this being something
specific to Zen 3, but I'm not sure why. Is there somewhere I should be
looking aside from "dmesg" to get some useful debug info? According to
the irdma driver there are no rdma packets getting
lost/dropped/erroring, etc. Common things like rping and
ib_read_bw/ib_write_bw tests all run indefinitely without error.
I would appreciate any help or advice with this or how I can help
confirm if this is indeed specific to Zen 3.
--
Jonathan Wright
KnownHost, LLC
https://www.knownhost.com
More information about the Linux-nvme
mailing list