nvme_tcp: nvme connect failed after execute stress-ng: unshare

Sagi Grimberg sagi at grimberg.me
Mon Sep 14 19:50:12 EDT 2020


> Hello
> 
> Recently I found nvme-tcp connecting always failed[1] after execute stress-ng:unshare[2], by bisecting I finally found it was introduced with commit[3], the connecting works well after revert it.
> I'm not sure whether it's one test case issue or kernel issue, could anyone help check it.

Is this failure persistent or transient?

> 
> [1]
> # sh test.sh
> + ./stress-ng/stress-ng --unshare 0 --timeout 5 --log-file unshare.log
> stress-ng: info:  [355534] dispatching hogs: 32 unshare
> stress-ng: info:  [355534] successful run completed in 5.04s
> + modprobe null-blk nr-devices=1
> + modprobe nvmet-tcp
> + modprobe nvme-tcp
> + nvmetcli restore tcp.json
> + nvme connect -t tcp -n nqn.2014-08.org.nvmexpress.discovery -a 127.0.0.1 -s 4420
> Failed to write to /dev/nvme-fabrics: Input/output error
> 
> # dmesg | tail -9
> [  700.012299] null_blk: module loaded
> [  700.073415] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [  700.073923] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [  715.291020] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> [  715.297031] nvmet: ctrl 1 fatal error occurred!
> [  749.939898] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:e405e6bb-8e28-4a73-b338-3fddb5746b8c.
> [  763.417376] nvme nvme0: queue 0: timeout request 0x0 type 4
> [  763.422979] nvme nvme0: Connect command failed, error wo/DNR bit: 881
> [  763.429419] nvme nvme0: failed to connect queue: 0 ret=881
> 
> # uname -r
> 5.9.0-rc4
> 
> 
> [2] stress-ng: unshare case
> https://github.com/ColinIanKing/stress-ng.git
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-unshare.c
> 
> 
> [3]
> commit e1eb26fa62d04ec0955432be1aa8722a97cb52e7
> Author: Giuseppe Scrivano <gscrivan at redhat.com>
> Date:   Sun Jun 7 21:40:10 2020 -0700
> 
>      ipc/namespace.c: use a work queue to free_ipc
>      
>      the reason is to avoid a delay caused by the synchronize_rcu() call in
>      kern_umount() when the mqueue mount is freed.
> 
> 
> [4]
> # cat tcp.json
> {
>    "hosts": [],
>    "ports": [
>      {
>        "addr": {
>          "adrfam": "ipv4",
>          "traddr": "127.0.0.1",
>          "treq": "not specified",
>          "trsvcid": "4420",
>          "trtype": "tcp"
>        },
>        "portid": 0,
>        "referrals": [],
>        "subsystems": [
>          "blktests-subsystem-1"
>        ]
>      }
>    ],
>    "subsystems": [
>      {
>        "allowed_hosts": [],
>        "attr": {
>          "allow_any_host": "1",
>          "cntlid_max": "65519",
>          "cntlid_min": "1",
>          "model": "Linux",
>          "pi_enable": "0",
>          "serial": "7d833f5501f6b240",
>          "version": "1.3"
>        },
>        "namespaces": [
>          {
>            "device": {
>              "nguid": "00000000-0000-0000-0000-000000000000",
>              "path": "/dev/nullb0",
>              "uuid": "b07c7eef-8428-47bf-8e79-26ec8c30f334"
>            },
>            "enable": 1,
>            "nsid": 1
>          }
>        ],
>        "nqn": "blktests-subsystem-1"
>      }
>    ]
> }
> 
> 
> Best Regards,
>    Yi Zhang
> 
> 



More information about the Linux-nvme mailing list