v5.9-rc1 nvme_rdma regression?

Bart Van Assche bvanassche at acm.org
Sat Jun 20 22:40:00 EDT 2020


Hi Christoph,

The following test:

# (cd ~bart/software/blktests && ./check -q nvmeof-mp/001)
nvmeof-mp/001 (Log in and log out)                           [passed]
    runtime  2.190s  ...  1.664s

passes with kernel versions v5.7 and v5.8-rc1 but fails systematically
with Jens' for-next branch (commit 35e6ecf1bc49).

The output I see in the kernel log with Jens' for-next branch is as
follows (as one can see I have added a few pr_debug() calls in the
nvme_rdma kernel driver and dyndbg for that driver has been enabled):

[  613.593929] run blktests nvmeof-mp/001 at 2020-06-20 18:17:21
[  613.857891] null_blk: module loaded
[  614.102013] rdma_rxe: loaded
[  614.298471] infiniband rxe0: set active
[  614.301401] infiniband rxe0: added eth0
[  614.558713] lo speed is unknown, defaulting to 1000
[  614.559518] lo speed is unknown, defaulting to 1000
[  614.560410] lo speed is unknown, defaulting to 1000
[  614.753406] infiniband rxe1: set active
[  614.753654] lo speed is unknown, defaulting to 1000
[  614.754028] infiniband rxe1: added lo
[  614.786663] lo speed is unknown, defaulting to 1000
[  615.244903] nvmet: adding nsid 1 to subsystem nvme-test
[  615.401282] nvmet_rdma: enabling port 1 (192.168.122.18:7777)
[  615.744908] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: address resolved (0): status 0 id ffff8881e5c9a000
[  615.753098] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: route resolved  (2): status 0 id ffff8881e5c9a000
[  615.755598] nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: connect request (4): status 0 id ffff8881f3088000
[  615.757012] nvmet_rdma:nvmet_rdma_find_get_device: nvmet_rdma: added rxe0.
[  615.761543] nvmet_rdma:nvmet_rdma_create_queue_ib: nvmet_rdma: nvmet_rdma_create_queue_ib: max_cqe= 127 max_sge= 32 sq_size = 289 cm_id= ffff8881f3088000
[  615.764771] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: established (9): status 0 id ffff8881e5c9a000
[  615.765866] nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: established (9): status 0 id ffff8881f3088000
[  615.770994] nvmet:nvmet_start_keep_alive_timer: nvmet: ctrl 1 start keep-alive timer for 15 secs
[  615.771700] nvmet: creating controller 1 for subsystem nvme-test for NQN nqn.2014-08.org.nvmexpress:uuid:21addb1b-59b9-4c9f-9ae8-7d248855c72e.
[  631.081023] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[  631.083341] nvmet: ctrl 1 fatal error occurred!
[  631.084647] nvmet_rdma:__nvmet_rdma_queue_disconnect: nvmet_rdma: cm_id= ffff8881f3088000 queue->state= 1
[  631.090141] nvmet_rdma:nvmet_rdma_free_queue: nvmet_rdma: freeing queue 0
[  631.090510] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: disconnected (10): status 0 id ffff8881e5c9a000
[  631.094056] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: disconnect received - connection closed
[  631.170795] nvmet:nvmet_stop_keep_alive_timer: nvmet: ctrl 1 stop keep-alive
[  675.881359] nvme nvme0: I/O 0 QID 0 timeout
[  675.921369] nvme_core:nvme_cancel_request: nvme nvme0: Cancelling I/O 0
[  675.922506] nvme nvme0: Connect command failed, error wo/DNR bit: 881
[  675.924930] nvme nvme0: failed to connect queue: 0 ret=881
[  676.045554] nvme_rdma:nvme_rdma_create_ctrl: nvme_rdma: nvme_rdma_setup_ctrl() returned 881
[  676.054748] nvme_rdma:nvme_rdma_create_ctrl: nvme_rdma: Changing return value 881 into -EIO
[  677.905855] rdma_rxe: not configured on eth0
[  677.919756] rdma_rxe: not configured on lo
[  678.824539] rdma_rxe: unloaded

Reverting the following two commits on top of Jens' for-next branch makes
this test pass again:
* nvme-rdma: factor out a nvme_rdma_end_request helper
* nvme: use blk_mq_complete_request_remote to avoid an indirect function call

Do you want to analyze this further or do you perhaps expect me to attempt
to come up with a fix?

Thanks,

Bart.



More information about the Linux-nvme mailing list