nvme-tcp crashes the system when overloading the backend device.

Sagi Grimberg sagi at grimberg.me
Mon Sep 6 04:12:18 PDT 2021


> Hi Sagi,
> 
> I installed a recent kernel on the system and restarted the test.
> The kernel is: 5.10.57
> 
> Before the kernel would crash, I stopped the test by disconnecting the initiators.
> "nvmetcli clear" did not hang and in this case, it still managed to remove the configuration.
> 
> # ls -l /sys/kernel/config/nvmet/ports/
> total 0
> # ls -l /sys/kernel/config/nvmet/subsystems/
> total 0
> 
> However, after this I still see nvmet_tcp_wq workers that are actively running:
> # ps aux | grep nvmet
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> root       163  0.4  0.0      0     0 ?        I<   02:28   0:21 [kworker/24:0H-nvmet_tcp_wq]
> root       199  0.0  0.0      0     0 ?        I<   02:28   0:02 [kworker/30:0H-nvmet_tcp_wq]
> root       355  0.0  0.0      0     0 ?        I<   02:28   0:04 [kworker/56:0H-nvmet_tcp_wq]
> root       361  0.0  0.0      0     0 ?        I<   02:28   0:03 [kworker/57:0H-nvmet_tcp_wq]
> root       683  1.2  0.0      0     0 ?        D<   02:30   0:56 [kworker/53:1H+nvmet_tcp_wq]
> root       785  1.0  0.0      0     0 ?        D<   02:30   0:44 [kworker/59:1H+nvmet_tcp_wq]
> root      1200  0.1  0.0      0     0 ?        D<   02:30   0:08 [kworker/27:1H+nvmet_tcp_wq]
> root     29212  2.0  0.0      0     0 ?        I<   03:28   0:18 [kworker/31:2H-nvmet_tcp_wq]
> root     32691  0.0  0.0      0     0 ?        I<   02:31   0:00 [nvmet-buffered-]
> root     39437  5.3  0.0      0     0 ?        D<   03:32   0:35 [kworker/51:3H+nvmet_tcp_wq]
> root     39440  1.8  0.0      0     0 ?        I<   03:32   0:12 [kworker/59:3H-nvmet_tcp_wq]
> root     39458 13.3  0.0      0     0 ?        I<   03:32   1:28 [kworker/18:3H-nvmet_tcp_wq]
> root     39508  7.1  0.0      0     0 ?        D<   03:32   0:47 [kworker/53:4H+nvmet_tcp_wq]
> root     39511  2.7  0.0      0     0 ?        D<   03:32   0:17 [kworker/28:5H+nvmet_tcp_wq]
> root     39520  7.7  0.0      0     0 ?        D<   03:32   0:51 [kworker/52:3H+nvmet_tcp_wq]
> root     39855  4.3  0.0      0     0 ?        I<   03:32   0:28 [kworker/48:4H-nvmet_tcp_wq]
> root     39857  3.0  0.0      0     0 ?        D<   03:32   0:20 [kworker/28:7H+nvmet_tcp_wq]
> root     39902  6.3  0.0      0     0 ?        D<   03:32   0:41 [kworker/27:6H+nvmet_tcp_wq]
> root     39928  5.3  0.0      0     0 ?        D<   03:32   0:35 [kworker/25:9H+nvmet_tcp_wq]
> root     39963  8.8  0.0      0     0 ?        D<   03:32   0:57 [kworker/24:6H+nvmet_tcp_wq]
> root     40024  3.3  0.0      0     0 ?        I<   03:32   0:21 [kworker/28:9H-nvmet_tcp_wq]
> root     40087  6.3  0.0      0     0 ?        I<   03:32   0:41 [kworker/53:6H-nvmet_tcp_wq]
> root     40169  6.1  0.0      0     0 ?        D<   03:32   0:40 [kworker/59:5H+nvmet_tcp_wq]
> root     40201  3.5  0.0      0     0 ?        D<   03:32   0:23 [kworker/54:8H+nvmet_tcp_wq]
> root     40333  0.6  0.0      0     0 ?        D<   03:32   0:04 [kworker/59:7H+nvmet_tcp_wq]
> root     40371  0.4  0.0      0     0 ?        I<   03:32   0:03 [kworker/49:5H-nvmet_tcp_wq]
> root     40375  2.5  0.0      0     0 ?        I<   03:32   0:16 [kworker/20:8H-nvmet_tcp_wq]
> root     40517  0.4  0.0      0     0 ?        I<   03:32   0:02 [kworker/58:6H-nvmet_tcp_wq]
> root     40811  2.8  0.0      0     0 ?        D<   03:33   0:17 [kworker/51:9H+nvmet_tcp_wq]
> root     40864  1.5  0.0      0     0 ?        I<   03:33   0:09 [kworker/29:5H-nvmet_tcp_wq]
> root     40891  1.7  0.0      0     0 ?        I<   03:33   0:10 [kworker/17:9H-nvmet_tcp_wq]
> root     40902  4.3  0.0      0     0 ?        D<   03:33   0:25 [kworker/59:8H+nvmet_tcp_wq]
> root     41061  3.3  0.0      0     0 ?        I<   03:34   0:18 [kworker/51:10H-nvmet_tcp_wq]
> root     41145  2.6  0.0      0     0 ?        D<   03:34   0:14 [kworker/56:7H+nvmet_tcp_wq]
> root     41278  1.3  0.0      0     0 ?        I<   03:34   0:07 [kworker/22:9H-nvmet_tcp_wq]
> 
> I've attached dmesg.txt as requested.

The dmesg output seems incomplete, I'm missing the nvmet-tcp threads 
that are blocked. Do you have this output in the journal that you can share?



More information about the Linux-nvme mailing list