Error when running fio against nvme-of rdma target (mlx5 driver)

Wed Jan 31 01:18:37 PST 2024

This is a re-sending of an email due to iommu old list bounce. Now
including new iommu at lists.linux.dev instead. First message was
submitted to LORE and is published at:

https://lore.kernel.org/all/9a40e66eb8ffc48a2e3765cf77f49914d57c55e7.camel@gmx.net/

Dear all,

We've encountered a similar issue. In our case, we are using the Lustre
file system instead of NVMe-oF to connect our storage over the network.
Our setup involves an AMD EPYC 7282 machine paired with Mellanox
MT28908 cards. Following the guidelines in the Nvidia documentation:

https://docs.nvidia.com/networking/display/mlnxenv584150lts/installing+mlnx_en#src-2477565014_InstallingMLNX_EN-InstallationModes

we compiled the MLNX_EN 5.8 LTS driver using VMA. Additionally, we
experimented with the latest MLNX_EN 23.10 driver, encountering the
same issue. 

We use kernel 5.15.0 now. This problems first appered after upgrading
our systems from Ubuntu 20.04 LTS to Ubuntu 22.04 LTS and, thus, from
kernel 5.4 to 5.15. Unfortunately, we are not completely sure about the
MLNX_EN driver version prior to the upgrade, but strongly assume <=
5.8.

The error process appears to be initiated by the following error
message:

mlx5_core 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0003 address=0x200020f758 flags=0x0020]

This error results in timeouts and read operation faults within the
Lustre file system on both the client and storage ends, potentially
leading to a complete storage failure. 

Subsequently, the IO_PAGE_FAULT error is often followed by a "local
protection error," although this is not consistent. The error tends to
manifest after a prolonged period of constant network operation,
typically after approximately one day of continuous read and write
operations involving a Postgres database on the file system. 

Regrettably, we have been unable to trigger this error by a manual
action.

Kind regards,
Arthur Müller

On Wed, 2022-02-09 at 02:50 +0000, Martin Oliveira wrote:
> Hello,
> 
> We have been hitting an error when running IO over our nvme-of setup,
> using the mlx5 driver and we are wondering if anyone has seen
> anything similar/has any suggestions.
> 
> Both initiator and target are AMD EPYC 7502 machines connected over
> RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are
> exposed as a single NVMe fabrics device, one physical SSD per
> namespace.
> 
> When running an fio job targeting directly the fabrics devices (no
> filesystem, see script at the end), within a minute or so we start
> seeing errors like this:
> 
> [  408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
> [  408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0):
> WC error: 4, Message: local protection error
> [  408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error
> cqe
> [  408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [  408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [  408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [  408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8
> e2
> [  408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed
> with status local protection error (4)
> [  408.380235] nvme nvme15: starting error recovery
> [  408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
> [  408.380246] block nvme15n2: no usable path - requeuing I/O
> [  408.380284] block nvme15n5: no usable path - requeuing I/O
> [  408.380298] block nvme15n1: no usable path - requeuing I/O
> [  408.380304] block nvme15n11: no usable path - requeuing I/O
> [  408.380304] block nvme15n11: no usable path - requeuing I/O
> [  408.380330] block nvme15n1: no usable path - requeuing I/O
> [  408.380350] block nvme15n2: no usable path - requeuing I/O
> [  408.380371] block nvme15n6: no usable path - requeuing I/O
> [  408.380377] block nvme15n6: no usable path - requeuing I/O
> [  408.380382] block nvme15n4: no usable path - requeuing I/O
> [  408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
> [  408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
> [  415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
> [  415.131898] nvmet: ctrl 1 fatal error occurred!
> 
> Occasionally, we've seen the following stack trace:
> 
> [ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
> [ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
> [ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted:
> P           OE     5.13.0-eid-athena-g6fb4e704d11c-dirty #14
> [ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS
> R21 10/08/2020
> [ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> [ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
> [ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48
> 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85
> f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90
> 0f 1f 44
> [ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
> [ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX:
> 0000000000000027
> [ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI:
> 0000000000000000
> [ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09:
> 0000000000000000
> [ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12:
> ffff9984abd9e318
> [ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15:
> 0001000000000000
> [ 1158.521452] FS:  0000000000000000(0000) GS:ffff99a36c8c0000(0000)
> knlGS:0000000000000000
> [ 1158.529540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4:
> 0000000000350ee0
> [ 1158.542419] Call Trace:
> [ 1158.544877]  amd_iommu_unmap+0x2c/0x40
> [ 1158.548653]  __iommu_unmap+0xc4/0x170
> [ 1158.552344]  iommu_unmap_fast+0xe/0x10
> [ 1158.556100]  __iommu_dma_unmap+0x85/0x120
> [ 1158.560115]  iommu_dma_unmap_sg+0x95/0x110
> [ 1158.564213]  dma_unmap_sg_attrs+0x42/0x50
> [ 1158.568225]  rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
> [ 1158.573201]  nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
> [ 1158.578944]  nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
> [ 1158.584683]  __ib_process_cq+0x8e/0x150 [ib_core]
> [ 1158.589398]  ib_cq_poll_work+0x2b/0x80 [ib_core]
> [ 1158.594027]  process_one_work+0x220/0x3c0
> [ 1158.598038]  worker_thread+0x4d/0x3f0
> [ 1158.601696]  kthread+0x114/0x150
> [ 1158.604928]  ? process_one_work+0x3c0/0x3c0
> [ 1158.609114]  ? kthread_park+0x90/0x90
> [ 1158.612783]  ret_from_fork+0x22/0x30
> 
> We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.
> 
> We found a possibly related bug report [1] that suggested disabling
> the IOMMU could help, but even after I disabled it (amd_iommu=off
> iommu=off) I still get errors (nvme IO timeouts). Another thread from
> 2016[2] suggested that disabling some kernel debug options could
> workaround the "local protection error" but that didn't help either.
> 
> As far as I can tell, the disks are fine, as running the same fio job
> targeting the real physical devices works fine.
> 
> Any suggestions are appreciated.
> 
> Thanks,
> Martin
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
> [2]:
> https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/
> 
> fio script:
> [global]
> name=fio-seq-write
> rw=write
> bs=1M
> direct=1
> numjobs=32
> time_based
> group_reporting=1
> runtime=18000
> end_fsync=1
> size=10G
> ioengine=libaio
> iodepth=16
> 
> [file1]
> filename=/dev/nvme0n1
> 
> [file2]
> filename=/dev/nvme0n2
> 
> [file3]
> filename=/dev/nvme0n3
> 
> [file4]
> filename=/dev/nvme0n4
> 
> [file5]
> filename=/dev/nvme0n5
> 
> [file6]
> filename=/dev/nvme0n6
> 
> [file7]
> filename=/dev/nvme0n7
> 
> [file8]
> filename=/dev/nvme0n8
> 
> [file9]
> filename=/dev/nvme0n9
> 
> [file10]
> filename=/dev/nvme0n10
> 
> [file11]
> filename=/dev/nvme0n11
> 
> [file12]
> filename=/dev/nvme0n12
> _______________________________________________
> iommu mailing list
> iommu at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu