nvme-tcp crashes the system when overloading the backend device.

Mark Ruijter mruijter at primelogic.nl
Mon Sep 6 05:25:04 PDT 2021


This is all the output I could still find from the test.
If you are still missing info than let me know. I will need to re-run the test in that case.

--Mark

On 06/09/2021, 13:12, "Sagi Grimberg" <sagi at grimberg.me> wrote:


    > Hi Sagi,
    > 
    > I installed a recent kernel on the system and restarted the test.
    > The kernel is: 5.10.57
    > 
    > Before the kernel would crash, I stopped the test by disconnecting the initiators.
    > "nvmetcli clear" did not hang and in this case, it still managed to remove the configuration.
    > 
    > # ls -l /sys/kernel/config/nvmet/ports/
    > total 0
    > # ls -l /sys/kernel/config/nvmet/subsystems/
    > total 0
    > 
    > However, after this I still see nvmet_tcp_wq workers that are actively running:
    > # ps aux | grep nvmet
    > USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    > root       163  0.4  0.0      0     0 ?        I<   02:28   0:21 [kworker/24:0H-nvmet_tcp_wq]
    > root       199  0.0  0.0      0     0 ?        I<   02:28   0:02 [kworker/30:0H-nvmet_tcp_wq]
    > root       355  0.0  0.0      0     0 ?        I<   02:28   0:04 [kworker/56:0H-nvmet_tcp_wq]
    > root       361  0.0  0.0      0     0 ?        I<   02:28   0:03 [kworker/57:0H-nvmet_tcp_wq]
    > root       683  1.2  0.0      0     0 ?        D<   02:30   0:56 [kworker/53:1H+nvmet_tcp_wq]
    > root       785  1.0  0.0      0     0 ?        D<   02:30   0:44 [kworker/59:1H+nvmet_tcp_wq]
    > root      1200  0.1  0.0      0     0 ?        D<   02:30   0:08 [kworker/27:1H+nvmet_tcp_wq]
    > root     29212  2.0  0.0      0     0 ?        I<   03:28   0:18 [kworker/31:2H-nvmet_tcp_wq]
    > root     32691  0.0  0.0      0     0 ?        I<   02:31   0:00 [nvmet-buffered-]
    > root     39437  5.3  0.0      0     0 ?        D<   03:32   0:35 [kworker/51:3H+nvmet_tcp_wq]
    > root     39440  1.8  0.0      0     0 ?        I<   03:32   0:12 [kworker/59:3H-nvmet_tcp_wq]
    > root     39458 13.3  0.0      0     0 ?        I<   03:32   1:28 [kworker/18:3H-nvmet_tcp_wq]
    > root     39508  7.1  0.0      0     0 ?        D<   03:32   0:47 [kworker/53:4H+nvmet_tcp_wq]
    > root     39511  2.7  0.0      0     0 ?        D<   03:32   0:17 [kworker/28:5H+nvmet_tcp_wq]
    > root     39520  7.7  0.0      0     0 ?        D<   03:32   0:51 [kworker/52:3H+nvmet_tcp_wq]
    > root     39855  4.3  0.0      0     0 ?        I<   03:32   0:28 [kworker/48:4H-nvmet_tcp_wq]
    > root     39857  3.0  0.0      0     0 ?        D<   03:32   0:20 [kworker/28:7H+nvmet_tcp_wq]
    > root     39902  6.3  0.0      0     0 ?        D<   03:32   0:41 [kworker/27:6H+nvmet_tcp_wq]
    > root     39928  5.3  0.0      0     0 ?        D<   03:32   0:35 [kworker/25:9H+nvmet_tcp_wq]
    > root     39963  8.8  0.0      0     0 ?        D<   03:32   0:57 [kworker/24:6H+nvmet_tcp_wq]
    > root     40024  3.3  0.0      0     0 ?        I<   03:32   0:21 [kworker/28:9H-nvmet_tcp_wq]
    > root     40087  6.3  0.0      0     0 ?        I<   03:32   0:41 [kworker/53:6H-nvmet_tcp_wq]
    > root     40169  6.1  0.0      0     0 ?        D<   03:32   0:40 [kworker/59:5H+nvmet_tcp_wq]
    > root     40201  3.5  0.0      0     0 ?        D<   03:32   0:23 [kworker/54:8H+nvmet_tcp_wq]
    > root     40333  0.6  0.0      0     0 ?        D<   03:32   0:04 [kworker/59:7H+nvmet_tcp_wq]
    > root     40371  0.4  0.0      0     0 ?        I<   03:32   0:03 [kworker/49:5H-nvmet_tcp_wq]
    > root     40375  2.5  0.0      0     0 ?        I<   03:32   0:16 [kworker/20:8H-nvmet_tcp_wq]
    > root     40517  0.4  0.0      0     0 ?        I<   03:32   0:02 [kworker/58:6H-nvmet_tcp_wq]
    > root     40811  2.8  0.0      0     0 ?        D<   03:33   0:17 [kworker/51:9H+nvmet_tcp_wq]
    > root     40864  1.5  0.0      0     0 ?        I<   03:33   0:09 [kworker/29:5H-nvmet_tcp_wq]
    > root     40891  1.7  0.0      0     0 ?        I<   03:33   0:10 [kworker/17:9H-nvmet_tcp_wq]
    > root     40902  4.3  0.0      0     0 ?        D<   03:33   0:25 [kworker/59:8H+nvmet_tcp_wq]
    > root     41061  3.3  0.0      0     0 ?        I<   03:34   0:18 [kworker/51:10H-nvmet_tcp_wq]
    > root     41145  2.6  0.0      0     0 ?        D<   03:34   0:14 [kworker/56:7H+nvmet_tcp_wq]
    > root     41278  1.3  0.0      0     0 ?        I<   03:34   0:07 [kworker/22:9H-nvmet_tcp_wq]
    > 
    > I've attached dmesg.txt as requested.

    The dmesg output seems incomplete, I'm missing the nvmet-tcp threads 
    that are blocked. Do you have this output in the journal that you can share?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: messages.xz
Type: application/octet-stream
Size: 196640 bytes
Desc: messages.xz
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20210906/8c205d21/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: warn.xz
Type: application/octet-stream
Size: 61552 bytes
Desc: warn.xz
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20210906/8c205d21/attachment-0003.obj>


More information about the Linux-nvme mailing list