nvmeof Issues with Zen 3/Ryzen 5000 Initiator

Jonathan Wright jonathan at knownhost.com
Thu Jun 3 09:57:02 PDT 2021


>> I've been testing NVMe over Fabrics for the past few weeks and the 
>> performance has been nothing short of incredible, though I'm running 
>> into some major issues that seems to be specifically related to AMD 
>> Zen 3 Ryzen chips (in my case I'm testing with 5900x).
>>
>> Target:
>> Supermicro X10 board
>> Xeon E5-2620v4
>> Intel E810 NIC
>>
>> Problematic Client/initiator:
>> ASRock X570 board
>> Ryzen 9 5900x
>> Intel E810 NIC
>>
>> Stable Client/initiator:
>> Supermicro X10 board
>> Xeon E5-2620v4
>> Intel E810 NIC
>>
>> I'm using the same 2 E810 NICs and pair of 25G DACs in both cases.  
>> The NICs are directly connected with the DACs and there is no switch 
>> in the equation.  To trigger the issue I'm simply using FIO similar 
>> to this:
>>
>> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 
>> --name=test --filename=/dev/nvme0n1 --bs=4k --iodepth=64 --size=10G 
>> --readwrite=randread --time_based --runtime=1200
>>
>> I'm primarily using RDMA/iWARP right now but I've also tested RoCE2 
>> which presents the same issues/symptoms.  Primary testing has been 
>> done with Ubuntu 20.04.2 with CentOS 8 in the mix as well just to try 
>> and rule out a weird distro-specific issue. All tests used the latest 
>> ice/irdma drivers from Intel (1.5.8 and 1.5.2 respectively)
>
> CCing Shiraz Saleem who maintains irdma.

Thanks.  I've done some testing now with Mellanox ConnectX-4 cards with 
Zen 3 and the issue does not exist.  This seems to point the finger to 
something specific between irdma and Zen 2/3 since irdma/E810 works fine 
on all-Intel hardware.  I tested the Mellanox on both Ubuntu 20.04 stock 
kernel (5.4) and CentOS 8.3 (stock kernel 4.18) as I tested the E810 on 
these combinations.

Further I tested an AMD target with an Intel initiator and the issue 
still exists so it doesn't seem to matter which end the Zen 3 (and/or 
Zen2) chip is on when paired with an E810/irdma.

The issue also exists with Zen 2 (Ryzen 3600).

@Shriaz since I guess this isn't a common setup right now let me know if 
I can be of any assistance with getting to the bottom of this seeming 
incompatibility.

>
>>
>> I've not yet tested a Ryzen 5900x target with an Intel initiator but 
>> i plan to to see if it exhibits the same instability.
>>
>> The issue presents itself as a connectivity loss between the two 
>> hosts - but there is no connectivity issue.  The issue is also 
>> somewhat inconsistent.  Sometimes it will show up after 1-2 minutes 
>> of testing, sometimes instantly, and sometimes close to 10 minutes in.
>>
>> Target dmesg sample:
>> [ 3867.598007] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
>> [ 3867.598384] nvmet: ctrl 1 fatal error occurred!
>>
>> Initiator dmesg sample:
>> <snip>
>> [  348.122160] nvme nvme4: I/O 86 QID 17 timeout
>> [  348.122224] nvme nvme4: I/O 87 QID 17 timeout
>> [  348.122290] nvme nvme4: I/O 88 QID 17 timeout
>> [  348.122354] nvme nvme4: I/O 89 QID 17 timeout
>> [  348.122417] nvme nvme4: I/O 90 QID 17 timeout
>> [  348.122480] nvme nvme4: I/O 91 QID 17 timeout
>> [  348.122544] nvme nvme4: I/O 92 QID 17 timeout
>> [  348.122607] nvme nvme4: I/O 93 QID 17 timeout
>> [  348.122670] nvme nvme4: I/O 94 QID 17 timeout
>> [  348.122733] nvme nvme4: I/O 95 QID 17 timeout
>> [  348.122796] nvme nvme4: I/O 96 QID 17 timeout
>> <snip>
>> [  380.387212] nvme nvme4: creating 24 I/O queues.
>> [  380.573925] nvme nvme4: Successfully reconnected (1 attempts)
>>
>> All the while the underlying connectivity is working just fine. 
>> There's a long delay between the timeout and the successful 
>> reconnect.  I haven't timed it but it seems like about 5 minutes. 
>> This has luckily given me plenty of time to test connectivity which 
>> has consistently been just fine on all fronts.
>
> Seems like loss of connectivity from the driver perspective.
> While this is happening, can you try an rdma application like
> ib_send_bwib_send_lat or something?
>
> I'd also suggest to run both workloads concurrently and see if they
> both suffer from a connectivity issue, this will help rule out
> if this is something specific to the nvme-rdma driver.
>
>>
>> I'm testing with a single Micron 9300 Pro 7.68TB right now which can 
>> push about 850k read IOPs.  On the Intel target/initiator combo I can 
>> run it "balls to the walls" for hours on end with 0 issues.  On the 
>> AMD initiator I can trigger the disconnect/drop generally within 5 
>> minutes. Here's where things get weird - if I limit the test to 200K 
>> IOPs or less then it's relatively stable on the AMD and I've not seen 
>> any drops when this limitation is in place.
>>
>> Here are some things I've tried which make no difference (or make 
>> things worse):
>>
>> Ubuntu 20.04.2 kernel 5.4.
>> Ubuntu 20.04.2 kernel 5.8
>> Ubuntu 20.04.2 kernel 5.10
>> CentOS 8 kernel 4.18
>> CentOS 8 kernel 5.10 (from elrepo)
>> CentOS 8 kernel 5.12 (from elrepo) - whole system actually freezes 
>> upon "nvme connect" command on this one
>> With and without multipath (native)
>> With and without round-robin on multipath (native)
>> Different NVMe drive models
>> With and without PFC
>> 10G DAC
>> 25G DAC
>> 25G DAC negotiated at 10G
>> With and without a switch
>> iWARP and RoCE2
>
> Looks like this probably always existed...
>
>>
>> I did do some testing with TCP/IP but cannot reach the >200k IOPS 
>> threshold with it which seems to be important for triggering the 
>> issue. I did not experience the drops with TCP/IP.
>>
>> I can't seem to draw any conclusion other than this being something 
>> specific to Zen 3, but I'm not sure why.  Is there somewhere I should 
>> be looking aside from "dmesg" to get some useful debug info?  
>> According to the irdma driver there are no rdma packets getting 
>> lost/dropped/erroring, etc.  Common things like rping and 
>> ib_read_bw/ib_write_bw tests all run indefinitely without error.
>
> Ah, that is an important detail.
>
> I think that packet sniffer can help here if this is the case, IIRC
> there should be way to sniff rdma traffic using tcpdump but I don't
> remember the details. Perhaps Intel folks can help you there...

-- 
Jonathan Wright
KnownHost, LLC
https://www.knownhost.com




More information about the Linux-nvme mailing list