nvmeof Issues with Zen 3/Ryzen 5000 Initiator
Sagi Grimberg
sagi at grimberg.me
Thu May 27 14:36:33 PDT 2021
> I've been testing NVMe over Fabrics for the past few weeks and the
> performance has been nothing short of incredible, though I'm running
> into some major issues that seems to be specifically related to AMD Zen
> 3 Ryzen chips (in my case I'm testing with 5900x).
>
> Target:
> Supermicro X10 board
> Xeon E5-2620v4
> Intel E810 NIC
>
> Problematic Client/initiator:
> ASRock X570 board
> Ryzen 9 5900x
> Intel E810 NIC
>
> Stable Client/initiator:
> Supermicro X10 board
> Xeon E5-2620v4
> Intel E810 NIC
>
> I'm using the same 2 E810 NICs and pair of 25G DACs in both cases. The
> NICs are directly connected with the DACs and there is no switch in the
> equation. To trigger the issue I'm simply using FIO similar to this:
>
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test --filename=/dev/nvme0n1 --bs=4k --iodepth=64 --size=10G
> --readwrite=randread --time_based --runtime=1200
>
> I'm primarily using RDMA/iWARP right now but I've also tested RoCE2
> which presents the same issues/symptoms. Primary testing has been done
> with Ubuntu 20.04.2 with CentOS 8 in the mix as well just to try and
> rule out a weird distro-specific issue. All tests used the latest
> ice/irdma drivers from Intel (1.5.8 and 1.5.2 respectively)
CCing Shiraz Saleem who maintains irdma.
>
> I've not yet tested a Ryzen 5900x target with an Intel initiator but i
> plan to to see if it exhibits the same instability.
>
> The issue presents itself as a connectivity loss between the two hosts -
> but there is no connectivity issue. The issue is also somewhat
> inconsistent. Sometimes it will show up after 1-2 minutes of testing,
> sometimes instantly, and sometimes close to 10 minutes in.
>
> Target dmesg sample:
> [ 3867.598007] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> [ 3867.598384] nvmet: ctrl 1 fatal error occurred!
>
> Initiator dmesg sample:
> <snip>
> [ 348.122160] nvme nvme4: I/O 86 QID 17 timeout
> [ 348.122224] nvme nvme4: I/O 87 QID 17 timeout
> [ 348.122290] nvme nvme4: I/O 88 QID 17 timeout
> [ 348.122354] nvme nvme4: I/O 89 QID 17 timeout
> [ 348.122417] nvme nvme4: I/O 90 QID 17 timeout
> [ 348.122480] nvme nvme4: I/O 91 QID 17 timeout
> [ 348.122544] nvme nvme4: I/O 92 QID 17 timeout
> [ 348.122607] nvme nvme4: I/O 93 QID 17 timeout
> [ 348.122670] nvme nvme4: I/O 94 QID 17 timeout
> [ 348.122733] nvme nvme4: I/O 95 QID 17 timeout
> [ 348.122796] nvme nvme4: I/O 96 QID 17 timeout
> <snip>
> [ 380.387212] nvme nvme4: creating 24 I/O queues.
> [ 380.573925] nvme nvme4: Successfully reconnected (1 attempts)
>
> All the while the underlying connectivity is working just fine. There's
> a long delay between the timeout and the successful reconnect. I
> haven't timed it but it seems like about 5 minutes. This has luckily
> given me plenty of time to test connectivity which has consistently been
> just fine on all fronts.
Seems like loss of connectivity from the driver perspective.
While this is happening, can you try an rdma application like
ib_send_bwib_send_lat or something?
I'd also suggest to run both workloads concurrently and see if they
both suffer from a connectivity issue, this will help rule out
if this is something specific to the nvme-rdma driver.
>
> I'm testing with a single Micron 9300 Pro 7.68TB right now which can
> push about 850k read IOPs. On the Intel target/initiator combo I can
> run it "balls to the walls" for hours on end with 0 issues. On the AMD
> initiator I can trigger the disconnect/drop generally within 5 minutes.
> Here's where things get weird - if I limit the test to 200K IOPs or less
> then it's relatively stable on the AMD and I've not seen any drops when
> this limitation is in place.
>
> Here are some things I've tried which make no difference (or make things
> worse):
>
> Ubuntu 20.04.2 kernel 5.4.
> Ubuntu 20.04.2 kernel 5.8
> Ubuntu 20.04.2 kernel 5.10
> CentOS 8 kernel 4.18
> CentOS 8 kernel 5.10 (from elrepo)
> CentOS 8 kernel 5.12 (from elrepo) - whole system actually freezes upon
> "nvme connect" command on this one
> With and without multipath (native)
> With and without round-robin on multipath (native)
> Different NVMe drive models
> With and without PFC
> 10G DAC
> 25G DAC
> 25G DAC negotiated at 10G
> With and without a switch
> iWARP and RoCE2
Looks like this probably always existed...
>
> I did do some testing with TCP/IP but cannot reach the >200k IOPS
> threshold with it which seems to be important for triggering the issue.
> I did not experience the drops with TCP/IP.
>
> I can't seem to draw any conclusion other than this being something
> specific to Zen 3, but I'm not sure why. Is there somewhere I should be
> looking aside from "dmesg" to get some useful debug info? According to
> the irdma driver there are no rdma packets getting
> lost/dropped/erroring, etc. Common things like rping and
> ib_read_bw/ib_write_bw tests all run indefinitely without error.
Ah, that is an important detail.
I think that packet sniffer can help here if this is the case, IIRC
there should be way to sniff rdma traffic using tcpdump but I don't
remember the details. Perhaps Intel folks can help you there...
More information about the Linux-nvme
mailing list