nvme-tcp crashes the system when overloading the backend device.

Thu Sep 2 04:31:45 PDT 2021

Hi Sagi,

I installed a recent kernel on the system and restarted the test.
The kernel is: 5.10.57

Before the kernel would crash, I stopped the test by disconnecting the initiators.
"nvmetcli clear" did not hang and in this case, it still managed to remove the configuration.

# ls -l /sys/kernel/config/nvmet/ports/
total 0
# ls -l /sys/kernel/config/nvmet/subsystems/
total 0

However, after this I still see nvmet_tcp_wq workers that are actively running:
# ps aux | grep nvmet
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       163  0.4  0.0      0     0 ?        I<   02:28   0:21 [kworker/24:0H-nvmet_tcp_wq]
root       199  0.0  0.0      0     0 ?        I<   02:28   0:02 [kworker/30:0H-nvmet_tcp_wq]
root       355  0.0  0.0      0     0 ?        I<   02:28   0:04 [kworker/56:0H-nvmet_tcp_wq]
root       361  0.0  0.0      0     0 ?        I<   02:28   0:03 [kworker/57:0H-nvmet_tcp_wq]
root       683  1.2  0.0      0     0 ?        D<   02:30   0:56 [kworker/53:1H+nvmet_tcp_wq]
root       785  1.0  0.0      0     0 ?        D<   02:30   0:44 [kworker/59:1H+nvmet_tcp_wq]
root      1200  0.1  0.0      0     0 ?        D<   02:30   0:08 [kworker/27:1H+nvmet_tcp_wq]
root     29212  2.0  0.0      0     0 ?        I<   03:28   0:18 [kworker/31:2H-nvmet_tcp_wq]
root     32691  0.0  0.0      0     0 ?        I<   02:31   0:00 [nvmet-buffered-]
root     39437  5.3  0.0      0     0 ?        D<   03:32   0:35 [kworker/51:3H+nvmet_tcp_wq]
root     39440  1.8  0.0      0     0 ?        I<   03:32   0:12 [kworker/59:3H-nvmet_tcp_wq]
root     39458 13.3  0.0      0     0 ?        I<   03:32   1:28 [kworker/18:3H-nvmet_tcp_wq]
root     39508  7.1  0.0      0     0 ?        D<   03:32   0:47 [kworker/53:4H+nvmet_tcp_wq]
root     39511  2.7  0.0      0     0 ?        D<   03:32   0:17 [kworker/28:5H+nvmet_tcp_wq]
root     39520  7.7  0.0      0     0 ?        D<   03:32   0:51 [kworker/52:3H+nvmet_tcp_wq]
root     39855  4.3  0.0      0     0 ?        I<   03:32   0:28 [kworker/48:4H-nvmet_tcp_wq]
root     39857  3.0  0.0      0     0 ?        D<   03:32   0:20 [kworker/28:7H+nvmet_tcp_wq]
root     39902  6.3  0.0      0     0 ?        D<   03:32   0:41 [kworker/27:6H+nvmet_tcp_wq]
root     39928  5.3  0.0      0     0 ?        D<   03:32   0:35 [kworker/25:9H+nvmet_tcp_wq]
root     39963  8.8  0.0      0     0 ?        D<   03:32   0:57 [kworker/24:6H+nvmet_tcp_wq]
root     40024  3.3  0.0      0     0 ?        I<   03:32   0:21 [kworker/28:9H-nvmet_tcp_wq]
root     40087  6.3  0.0      0     0 ?        I<   03:32   0:41 [kworker/53:6H-nvmet_tcp_wq]
root     40169  6.1  0.0      0     0 ?        D<   03:32   0:40 [kworker/59:5H+nvmet_tcp_wq]
root     40201  3.5  0.0      0     0 ?        D<   03:32   0:23 [kworker/54:8H+nvmet_tcp_wq]
root     40333  0.6  0.0      0     0 ?        D<   03:32   0:04 [kworker/59:7H+nvmet_tcp_wq]
root     40371  0.4  0.0      0     0 ?        I<   03:32   0:03 [kworker/49:5H-nvmet_tcp_wq]
root     40375  2.5  0.0      0     0 ?        I<   03:32   0:16 [kworker/20:8H-nvmet_tcp_wq]
root     40517  0.4  0.0      0     0 ?        I<   03:32   0:02 [kworker/58:6H-nvmet_tcp_wq]
root     40811  2.8  0.0      0     0 ?        D<   03:33   0:17 [kworker/51:9H+nvmet_tcp_wq]
root     40864  1.5  0.0      0     0 ?        I<   03:33   0:09 [kworker/29:5H-nvmet_tcp_wq]
root     40891  1.7  0.0      0     0 ?        I<   03:33   0:10 [kworker/17:9H-nvmet_tcp_wq]
root     40902  4.3  0.0      0     0 ?        D<   03:33   0:25 [kworker/59:8H+nvmet_tcp_wq]
root     41061  3.3  0.0      0     0 ?        I<   03:34   0:18 [kworker/51:10H-nvmet_tcp_wq]
root     41145  2.6  0.0      0     0 ?        D<   03:34   0:14 [kworker/56:7H+nvmet_tcp_wq]
root     41278  1.3  0.0      0     0 ?        I<   03:34   0:07 [kworker/22:9H-nvmet_tcp_wq]

I've attached dmesg.txt as requested.

Thanks,

On 01/09/2021, 16:47, "Sagi Grimberg" <sagi at grimberg.me> wrote:

    > Hi Sagi,
    > 
    > I can reproduce this problem with any recent kernel.
    > At least all these kernels I tested suffer from the problem: 5.10.40, 5.10.57, 5.14-rc4 as well as SuSE SLES15-SP2 with kernel 5.3.18-24.37-default.
    > On the initiator I use Ubuntu 20.04 LTS with kernel 5.10.0-1019.

    Thanks.

    >>> Is it possible to check if the R5 device has inflight commands? if not
    > there is some race condition or misaccounting that prevents an orderly
    > shutdown of the queues.
    > 
    > I will double check; however, I don't think that the underlying device is the problem.
    > The exact same test passes with the nvmet-rdma target.
    > It only fails with the nvmet-tcp target driver.

    OK, that is useful information.

    > 
    > At far as I can tell I exhaust the budget in nvmet_tcp_io_work and requeue:
    > 
    > 1293         } while (pending && ops < NVMET_TCP_IO_WORK_BUDGET);
    > 1294
    > 1295         /*
    > 1296          * Requeue the worker if idle deadline period is in progress or any
    > 1297          * ops activity was recorded during the do-while loop above.
    > 1298          */
    > 1299         if (nvmet_tcp_check_queue_deadline(queue, ops) || pending)
    > 1300                 queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
    > 
    > I added pr_info statements in the code to determine what is going on:
    > 2021-09-01T07:15:26.944067-06:00 gold kernel: [ 5502.786914] nvmet_tcp: MARK exhausted budget: ret = 0, ops = 71
    > 2021-09-01T07:15:26.944070-06:00 gold kernel: [ 5502.787455] nvmet: ctrl 49 keep-alive timer (15 seconds) expired!
    > 2021-09-01T07:15:26.944072-06:00 gold kernel: [ 5502.787461] nvmet: ctrl 49 fatal error occurred!
    > 
    > Shortly after the routine nvmet_fatal_error_handler gets triggered:
    > static void nvmet_fatal_error_handler(struct work_struct *work)
    > {
    >          struct nvmet_ctrl *ctrl =
    >                          container_of(work, struct nvmet_ctrl, fatal_err_work);
    > 
    >          pr_err("ctrl %d fatal error occurred!\n", ctrl->cntlid);
    >          ctrl->ops->delete_ctrl(ctrl);
    > }
    > 
    > Some of nvme_tcp_wq workers now keep running and the number of workers keeps increasing.
    > root      3686  3.3  0.0      0     0 ?        I<   07:31   0:29 [kworker/11:0H-nvmet_tcp_wq]
    > root      3689 12.0  0.0      0     0 ?        I<   07:31   1:43 [kworker/25:0H-nvmet_tcp_wq]
    > root      3695 12.0  0.0      0     0 ?        I<   07:31   1:43 [kworker/55:3H-nvmet_tcp_wq]
    > root      3699  5.0  0.0      0     0 ?        I<   07:31   0:43 [kworker/38:1H-nvmet_tcp_wq]
    > root      3704 11.5  0.0      0     0 ?        I<   07:31   1:39 [kworker/21:0H-nvmet_tcp_wq]
    > root      3708 12.1  0.0      0     0 ?        I<   07:31   1:44 [kworker/31:0H-nvmet_tcp_wq]
    > 
    > "nvmetcli clear" will no longer return after this and when you keep the initiators running the system eventually crashes.
    > 

    OK, so maybe some information can help. When you reproduce this for the 
    first time I would dump all the threads in the system to dmesg.

    So if you can do the following:
    1. reproduce the hang
    2. nvmetcli clear
    3. echo t > /proc/sysrq-trigger

    And share the dmesg output with us?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.txt.xz
Type: application/octet-stream
Size: 58372 bytes
Desc: dmesg.txt.xz
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20210902/e15728a6/attachment-0001.obj>