v5.9-rc1 nvme_rdma regression?
Bart Van Assche
bvanassche at acm.org
Sat Jun 20 22:40:00 EDT 2020
Hi Christoph,
The following test:
# (cd ~bart/software/blktests && ./check -q nvmeof-mp/001)
nvmeof-mp/001 (Log in and log out) [passed]
runtime 2.190s ... 1.664s
passes with kernel versions v5.7 and v5.8-rc1 but fails systematically
with Jens' for-next branch (commit 35e6ecf1bc49).
The output I see in the kernel log with Jens' for-next branch is as
follows (as one can see I have added a few pr_debug() calls in the
nvme_rdma kernel driver and dyndbg for that driver has been enabled):
[ 613.593929] run blktests nvmeof-mp/001 at 2020-06-20 18:17:21
[ 613.857891] null_blk: module loaded
[ 614.102013] rdma_rxe: loaded
[ 614.298471] infiniband rxe0: set active
[ 614.301401] infiniband rxe0: added eth0
[ 614.558713] lo speed is unknown, defaulting to 1000
[ 614.559518] lo speed is unknown, defaulting to 1000
[ 614.560410] lo speed is unknown, defaulting to 1000
[ 614.753406] infiniband rxe1: set active
[ 614.753654] lo speed is unknown, defaulting to 1000
[ 614.754028] infiniband rxe1: added lo
[ 614.786663] lo speed is unknown, defaulting to 1000
[ 615.244903] nvmet: adding nsid 1 to subsystem nvme-test
[ 615.401282] nvmet_rdma: enabling port 1 (192.168.122.18:7777)
[ 615.744908] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: address resolved (0): status 0 id ffff8881e5c9a000
[ 615.753098] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: route resolved (2): status 0 id ffff8881e5c9a000
[ 615.755598] nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: connect request (4): status 0 id ffff8881f3088000
[ 615.757012] nvmet_rdma:nvmet_rdma_find_get_device: nvmet_rdma: added rxe0.
[ 615.761543] nvmet_rdma:nvmet_rdma_create_queue_ib: nvmet_rdma: nvmet_rdma_create_queue_ib: max_cqe= 127 max_sge= 32 sq_size = 289 cm_id= ffff8881f3088000
[ 615.764771] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: established (9): status 0 id ffff8881e5c9a000
[ 615.765866] nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: established (9): status 0 id ffff8881f3088000
[ 615.770994] nvmet:nvmet_start_keep_alive_timer: nvmet: ctrl 1 start keep-alive timer for 15 secs
[ 615.771700] nvmet: creating controller 1 for subsystem nvme-test for NQN nqn.2014-08.org.nvmexpress:uuid:21addb1b-59b9-4c9f-9ae8-7d248855c72e.
[ 631.081023] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[ 631.083341] nvmet: ctrl 1 fatal error occurred!
[ 631.084647] nvmet_rdma:__nvmet_rdma_queue_disconnect: nvmet_rdma: cm_id= ffff8881f3088000 queue->state= 1
[ 631.090141] nvmet_rdma:nvmet_rdma_free_queue: nvmet_rdma: freeing queue 0
[ 631.090510] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: disconnected (10): status 0 id ffff8881e5c9a000
[ 631.094056] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: disconnect received - connection closed
[ 631.170795] nvmet:nvmet_stop_keep_alive_timer: nvmet: ctrl 1 stop keep-alive
[ 675.881359] nvme nvme0: I/O 0 QID 0 timeout
[ 675.921369] nvme_core:nvme_cancel_request: nvme nvme0: Cancelling I/O 0
[ 675.922506] nvme nvme0: Connect command failed, error wo/DNR bit: 881
[ 675.924930] nvme nvme0: failed to connect queue: 0 ret=881
[ 676.045554] nvme_rdma:nvme_rdma_create_ctrl: nvme_rdma: nvme_rdma_setup_ctrl() returned 881
[ 676.054748] nvme_rdma:nvme_rdma_create_ctrl: nvme_rdma: Changing return value 881 into -EIO
[ 677.905855] rdma_rxe: not configured on eth0
[ 677.919756] rdma_rxe: not configured on lo
[ 678.824539] rdma_rxe: unloaded
Reverting the following two commits on top of Jens' for-next branch makes
this test pass again:
* nvme-rdma: factor out a nvme_rdma_end_request helper
* nvme: use blk_mq_complete_request_remote to avoid an indirect function call
Do you want to analyze this further or do you perhaps expect me to attempt
to come up with a fix?
Thanks,
Bart.
More information about the Linux-nvme
mailing list