[Bug 215157] New: nvme delete-ns over nvme-tcp to nvmet-tcp error

Wed Dec 8 05:05:40 PST 2021

On 11/29/21 8:01 PM, Keith Busch wrote:
> [adding Sagi & linux-nvme mailing list]
> 
> On Sat, Nov 27, 2021 at 11:24:57PM +0000, bugzilla-daemon at bugzilla.kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=215157
> 
> <snip>
> 
>> Reproduction:
>>
>>
>> modprobe nvmet
>> modprobe nvmet-tcp
>> truncate --size 1G /tmp/foo1.img
>> DEV=$(losetup --find /tmp/foo1.img)
>>
>> mkdir /sys/kernel/config/nvmet/subsystems/testsubsystem
>> cd /sys/kernel/config/nvmet/subsystems/testsubsystem
>> echo 1 > attr_allow_any_host
>> echo "${DEV}" > device_path
>> echo 1 > enable
>>
>> mkdir /sys/kernel/config/nvmet/ports/1
>> cd /sys/kernel/config/nvmet/ports/1
>> echo 4420 > addr_trsvcid
>> echo 127.0.0.1 > addr_traddr
>> echo tcp > addr_trtype
>> echo ipv4 > addr_adrfam
>> ln -s /sys/kernel/config/nvmet/subsystems/testsubsystem/
>> /sys/kernel/config/nvmet/ports/1/subsystems/testsubsystem
>>
>> modprobe nvme-tcp
>> nvme connect -t tcp -n testsubsystem -a 127.0.0.1 -s 4420
>> nvme list
>> nvme delete-ns -n 1 /dev/nvmeX
>>
>>
>> Eventually will finish with Interuptted, but the device will be unusable after
>> this.
>>
>> The nvme-tcp shows issues in dmesg:
>>
>> [10638.139221] nvmet: creating nvm controller 1 for subsystem testsubsystem for
>> NQN nqn.2014-08.org.nvmexpress:uuid:88a968b7-a5d7-4549-aaa8-a31f1d25aa4b.
>> [10638.139677] nvme nvme2: creating 32 I/O queues.
>> [10638.141556] nvme nvme2: mapped 32/0/0 default/read/poll queues.
>> [10638.145884] nvme nvme2: new ctrl: NQN "testsubsystem", addr 127.0.0.1:4420
>> [10678.676734] nvmet_tcp: failed cmd 00000000fdc07f80 id 4100 opcode 13, data_len: 0
>> [10684.129665] nvme nvme2: queue 0: timeout request 0x0 type 4
> 
> The target sees the opcode 13 (0xd) for the delete namespace management
> command and correcty decides it is unsupported, so the request
> initialization fails. Here's where it looks like the target code goes
> wrong:
> 
> nvmet_tcp_try_recv_one
>    nvmet_tcp_done_recv_pdu
>      nvmet_req_init
>        __nvmet_req_complete
>          nvmet_tcp_queue_response <- cmd is added to the queue resp_list
>      nvmet_tcp_handle_req_failure() -> rcv_state set to NVMET_TCP_RECV_DATA, flags set to NVMET_TCP_F_INIT_FAILED

nvmet_tcp_handle_req_failure will not do anything if the command does
not have any incapsule data (which is the case here).

>    nvmet_tcp_try_recv_data
>      nvmet_tcp_execute_request <- sees NVMET_TCP_F_INIT_FAILED
>        nvmet_tcp_queue_response <- cmd is added to the queue resp_list a 2nd time

There is a subtle case here as if this is a command that we are queueing
back is the current command being processed (which is the case if we
want to fail a command that we still need to accept the incapsule
data) we don't attach it to the resp_list. I do agree this is convoluted
and needs a cleaner failure handling...

> 
> That can't be right: the cmd shouldn't be added to the response list
> twice.
> 
> Sagi, am I missing something here?
> 
>> [10684.129673] nvme nvme2: starting error recovery
>> [10684.131074] nvme nvme2: failed nvme_keep_alive_end_io error=10
>> [10684.137695] nvme nvme2: Reconnecting in 10 seconds...
>> [10691.809468] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
>> [10691.809519] nvmet: ctrl 1 fatal error occurred!
>> [10756.832004] nvme nvme2: queue 0: timeout request 0x0 type 4
>> [10756.832137] nvme nvme2: Connect command failed, error wo/DNR bit: 881
>> [10756.832199] nvme nvme2: failed to connect queue: 0 ret=881
>> [10756.832222] nvme nvme2: Failed reconnect attempt 1
>> [10756.832225] nvme nvme2: Reconnecting in 10 seconds...
>> [10814.174824] INFO: task kworker/11:2:11273 blocked for more than 122 seconds.
>> [10814.174832]       Tainted: G            E     5.16.0-rc2 #1
>> [10814.174834] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>> message.
>> [10814.174835] task:kworker/11:2    state:D stack:    0 pid:11273 ppid:     2
>> flags:0x00004000
>> [10814.174841] Workqueue: events nvmet_tcp_release_queue_work [nvmet_tcp]
>> [10814.174850] Call Trace:
>> [10814.174852]  <TASK>
>> [10814.174855]  __schedule+0x23f/0xe50
>> [10814.174861]  ? update_load_avg+0x77/0x4d0
>> [10814.174866]  schedule+0x50/0xd0
>> [10814.174869]  schedule_timeout+0x99/0xd0
>> [10814.174873]  ? raw_spin_rq_lock_nested+0x19/0x80
>> [10814.174876]  wait_for_completion+0x74/0xc0
>> [10814.174880]  __flush_work.isra.0+0x14b/0x210
>> [10814.174884]  ? flush_workqueue_prep_pwqs+0x110/0x110
>> [10814.174887]  nvmet_tcp_release_queue_work+0xc3/0x330 [nvmet_tcp]
>> [10814.174892]  process_one_work+0x1ad/0x310
>> [10814.174895]  worker_thread+0x48/0x3d0
>> [10814.174899]  ? process_one_work+0x310/0x310
>> [10814.174901]  kthread+0x155/0x180
>> [10814.174904]  ? set_kthread_struct+0x30/0x30
>> [10814.174907]  ret_from_fork+0x1f/0x30
>> [10814.174912]  </TASK>
>> [10830.558291] nvme nvme2: queue 0: timeout request 0x0 type 4
>> [10830.558422] nvme nvme2: Connect command failed, error wo/DNR bit: 881
>> [10830.558484] nvme nvme2: failed to connect queue: 0 ret=881
>> [10830.558507] nvme nvme2: Failed reconnect attempt 2
>> [10830.558510] nvme nvme2: Reconnecting in 10 seconds...
>> [10904.284728] nvme nvme2: queue 0: timeout request 0x0 type 4
>> [10904.284757] nvme nvme2: Connect command failed, error wo/DNR bit: 881
>> [10904.284818] nvme nvme2: failed to connect queue: 0 ret=881
>> [10904.284842] nvme nvme2: Failed reconnect attempt 3
>> [10904.284845] nvme nvme2: Reconnecting in 10 seconds...

The issue here seems to be that the controller fails the command but it
still times out for the host - which suggest that the error response is
not actually queued.