nvmeof Issues with Zen 3/Ryzen 5000 Initiator
Jonathan Wright
jonathan at knownhost.com
Thu Jun 3 09:57:02 PDT 2021
>> I've been testing NVMe over Fabrics for the past few weeks and the
>> performance has been nothing short of incredible, though I'm running
>> into some major issues that seems to be specifically related to AMD
>> Zen 3 Ryzen chips (in my case I'm testing with 5900x).
>>
>> Target:
>> Supermicro X10 board
>> Xeon E5-2620v4
>> Intel E810 NIC
>>
>> Problematic Client/initiator:
>> ASRock X570 board
>> Ryzen 9 5900x
>> Intel E810 NIC
>>
>> Stable Client/initiator:
>> Supermicro X10 board
>> Xeon E5-2620v4
>> Intel E810 NIC
>>
>> I'm using the same 2 E810 NICs and pair of 25G DACs in both cases.
>> The NICs are directly connected with the DACs and there is no switch
>> in the equation. To trigger the issue I'm simply using FIO similar
>> to this:
>>
>> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
>> --name=test --filename=/dev/nvme0n1 --bs=4k --iodepth=64 --size=10G
>> --readwrite=randread --time_based --runtime=1200
>>
>> I'm primarily using RDMA/iWARP right now but I've also tested RoCE2
>> which presents the same issues/symptoms. Primary testing has been
>> done with Ubuntu 20.04.2 with CentOS 8 in the mix as well just to try
>> and rule out a weird distro-specific issue. All tests used the latest
>> ice/irdma drivers from Intel (1.5.8 and 1.5.2 respectively)
>
> CCing Shiraz Saleem who maintains irdma.
Thanks. I've done some testing now with Mellanox ConnectX-4 cards with
Zen 3 and the issue does not exist. This seems to point the finger to
something specific between irdma and Zen 2/3 since irdma/E810 works fine
on all-Intel hardware. I tested the Mellanox on both Ubuntu 20.04 stock
kernel (5.4) and CentOS 8.3 (stock kernel 4.18) as I tested the E810 on
these combinations.
Further I tested an AMD target with an Intel initiator and the issue
still exists so it doesn't seem to matter which end the Zen 3 (and/or
Zen2) chip is on when paired with an E810/irdma.
The issue also exists with Zen 2 (Ryzen 3600).
@Shriaz since I guess this isn't a common setup right now let me know if
I can be of any assistance with getting to the bottom of this seeming
incompatibility.
>
>>
>> I've not yet tested a Ryzen 5900x target with an Intel initiator but
>> i plan to to see if it exhibits the same instability.
>>
>> The issue presents itself as a connectivity loss between the two
>> hosts - but there is no connectivity issue. The issue is also
>> somewhat inconsistent. Sometimes it will show up after 1-2 minutes
>> of testing, sometimes instantly, and sometimes close to 10 minutes in.
>>
>> Target dmesg sample:
>> [ 3867.598007] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
>> [ 3867.598384] nvmet: ctrl 1 fatal error occurred!
>>
>> Initiator dmesg sample:
>> <snip>
>> [ 348.122160] nvme nvme4: I/O 86 QID 17 timeout
>> [ 348.122224] nvme nvme4: I/O 87 QID 17 timeout
>> [ 348.122290] nvme nvme4: I/O 88 QID 17 timeout
>> [ 348.122354] nvme nvme4: I/O 89 QID 17 timeout
>> [ 348.122417] nvme nvme4: I/O 90 QID 17 timeout
>> [ 348.122480] nvme nvme4: I/O 91 QID 17 timeout
>> [ 348.122544] nvme nvme4: I/O 92 QID 17 timeout
>> [ 348.122607] nvme nvme4: I/O 93 QID 17 timeout
>> [ 348.122670] nvme nvme4: I/O 94 QID 17 timeout
>> [ 348.122733] nvme nvme4: I/O 95 QID 17 timeout
>> [ 348.122796] nvme nvme4: I/O 96 QID 17 timeout
>> <snip>
>> [ 380.387212] nvme nvme4: creating 24 I/O queues.
>> [ 380.573925] nvme nvme4: Successfully reconnected (1 attempts)
>>
>> All the while the underlying connectivity is working just fine.
>> There's a long delay between the timeout and the successful
>> reconnect. I haven't timed it but it seems like about 5 minutes.
>> This has luckily given me plenty of time to test connectivity which
>> has consistently been just fine on all fronts.
>
> Seems like loss of connectivity from the driver perspective.
> While this is happening, can you try an rdma application like
> ib_send_bwib_send_lat or something?
>
> I'd also suggest to run both workloads concurrently and see if they
> both suffer from a connectivity issue, this will help rule out
> if this is something specific to the nvme-rdma driver.
>
>>
>> I'm testing with a single Micron 9300 Pro 7.68TB right now which can
>> push about 850k read IOPs. On the Intel target/initiator combo I can
>> run it "balls to the walls" for hours on end with 0 issues. On the
>> AMD initiator I can trigger the disconnect/drop generally within 5
>> minutes. Here's where things get weird - if I limit the test to 200K
>> IOPs or less then it's relatively stable on the AMD and I've not seen
>> any drops when this limitation is in place.
>>
>> Here are some things I've tried which make no difference (or make
>> things worse):
>>
>> Ubuntu 20.04.2 kernel 5.4.
>> Ubuntu 20.04.2 kernel 5.8
>> Ubuntu 20.04.2 kernel 5.10
>> CentOS 8 kernel 4.18
>> CentOS 8 kernel 5.10 (from elrepo)
>> CentOS 8 kernel 5.12 (from elrepo) - whole system actually freezes
>> upon "nvme connect" command on this one
>> With and without multipath (native)
>> With and without round-robin on multipath (native)
>> Different NVMe drive models
>> With and without PFC
>> 10G DAC
>> 25G DAC
>> 25G DAC negotiated at 10G
>> With and without a switch
>> iWARP and RoCE2
>
> Looks like this probably always existed...
>
>>
>> I did do some testing with TCP/IP but cannot reach the >200k IOPS
>> threshold with it which seems to be important for triggering the
>> issue. I did not experience the drops with TCP/IP.
>>
>> I can't seem to draw any conclusion other than this being something
>> specific to Zen 3, but I'm not sure why. Is there somewhere I should
>> be looking aside from "dmesg" to get some useful debug info?
>> According to the irdma driver there are no rdma packets getting
>> lost/dropped/erroring, etc. Common things like rping and
>> ib_read_bw/ib_write_bw tests all run indefinitely without error.
>
> Ah, that is an important detail.
>
> I think that packet sniffer can help here if this is the case, IIRC
> there should be way to sniff rdma traffic using tcpdump but I don't
> remember the details. Perhaps Intel folks can help you there...
--
Jonathan Wright
KnownHost, LLC
https://www.knownhost.com
More information about the Linux-nvme
mailing list