Error when running fio against nvme-of rdma target (mlx5 driver)
Martin Oliveira
Martin.Oliveira at eideticom.com
Tue Feb 8 18:50:35 PST 2022
Hello,
We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
When running an fio job targeting directly the fabrics devices (no filesystem, see script at the end), within a minute or so we start seeing errors like this:
[ 408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
[ 408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
[ 408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
[ 408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8 e2
[ 408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed with status local protection error (4)
[ 408.380235] nvme nvme15: starting error recovery
[ 408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
[ 408.380246] block nvme15n2: no usable path - requeuing I/O
[ 408.380284] block nvme15n5: no usable path - requeuing I/O
[ 408.380298] block nvme15n1: no usable path - requeuing I/O
[ 408.380304] block nvme15n11: no usable path - requeuing I/O
[ 408.380304] block nvme15n11: no usable path - requeuing I/O
[ 408.380330] block nvme15n1: no usable path - requeuing I/O
[ 408.380350] block nvme15n2: no usable path - requeuing I/O
[ 408.380371] block nvme15n6: no usable path - requeuing I/O
[ 408.380377] block nvme15n6: no usable path - requeuing I/O
[ 408.380382] block nvme15n4: no usable path - requeuing I/O
[ 408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
[ 408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
[ 415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
[ 415.131898] nvmet: ctrl 1 fatal error occurred!
Occasionally, we've seen the following stack trace:
[ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
[ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
[ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted: P OE 5.13.0-eid-athena-g6fb4e704d11c-dirty #14
[ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21 10/08/2020
[ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
[ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85 f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
[ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
[ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX: 0000000000000027
[ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI: 0000000000000000
[ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09: 0000000000000000
[ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12: ffff9984abd9e318
[ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15: 0001000000000000
[ 1158.521452] FS: 0000000000000000(0000) GS:ffff99a36c8c0000(0000) knlGS:0000000000000000
[ 1158.529540] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4: 0000000000350ee0
[ 1158.542419] Call Trace:
[ 1158.544877] amd_iommu_unmap+0x2c/0x40
[ 1158.548653] __iommu_unmap+0xc4/0x170
[ 1158.552344] iommu_unmap_fast+0xe/0x10
[ 1158.556100] __iommu_dma_unmap+0x85/0x120
[ 1158.560115] iommu_dma_unmap_sg+0x95/0x110
[ 1158.564213] dma_unmap_sg_attrs+0x42/0x50
[ 1158.568225] rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
[ 1158.573201] nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
[ 1158.578944] nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
[ 1158.584683] __ib_process_cq+0x8e/0x150 [ib_core]
[ 1158.589398] ib_cq_poll_work+0x2b/0x80 [ib_core]
[ 1158.594027] process_one_work+0x220/0x3c0
[ 1158.598038] worker_thread+0x4d/0x3f0
[ 1158.601696] kthread+0x114/0x150
[ 1158.604928] ? process_one_work+0x3c0/0x3c0
[ 1158.609114] ? kthread_park+0x90/0x90
[ 1158.612783] ret_from_fork+0x22/0x30
We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.
We found a possibly related bug report [1] that suggested disabling the IOMMU could help, but even after I disabled it (amd_iommu=off iommu=off) I still get errors (nvme IO timeouts). Another thread from 2016[2] suggested that disabling some kernel debug options could workaround the "local protection error" but that didn't help either.
As far as I can tell, the disks are fine, as running the same fio job targeting the real physical devices works fine.
Any suggestions are appreciated.
Thanks,
Martin
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
[2]: https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/
fio script:
[global]
name=fio-seq-write
rw=write
bs=1M
direct=1
numjobs=32
time_based
group_reporting=1
runtime=18000
end_fsync=1
size=10G
ioengine=libaio
iodepth=16
[file1]
filename=/dev/nvme0n1
[file2]
filename=/dev/nvme0n2
[file3]
filename=/dev/nvme0n3
[file4]
filename=/dev/nvme0n4
[file5]
filename=/dev/nvme0n5
[file6]
filename=/dev/nvme0n6
[file7]
filename=/dev/nvme0n7
[file8]
filename=/dev/nvme0n8
[file9]
filename=/dev/nvme0n9
[file10]
filename=/dev/nvme0n10
[file11]
filename=/dev/nvme0n11
[file12]
filename=/dev/nvme0n12
More information about the Linux-nvme
mailing list