nvme-tcp crashes the system when overloading the backend device.
Mark Ruijter
mruijter at primelogic.nl
Tue Aug 31 06:30:51 PDT 2021
Hi all,
I can consistently crash a system when I sufficiently overload the nvme-tcp target.
The easiest way to reproduce the problem is by creating a raid5.
While this R5 is resyncing export it with the nvmet-tcp target driver and start a high queue-depth 4K random fio workload from the initiator.
At some point the target system will start logging these messages:
[ 2865.725069] nvmet: ctrl 238 keep-alive timer (15 seconds) expired!
[ 2865.725072] nvmet: ctrl 236 keep-alive timer (15 seconds) expired!
[ 2865.725075] nvmet: ctrl 238 fatal error occurred!
[ 2865.725076] nvmet: ctrl 236 fatal error occurred!
[ 2865.725080] nvmet: ctrl 237 keep-alive timer (15 seconds) expired!
[ 2865.725083] nvmet: ctrl 237 fatal error occurred!
[ 2865.725087] nvmet: ctrl 235 keep-alive timer (15 seconds) expired!
[ 2865.725094] nvmet: ctrl 235 fatal error occurred!
Even when you stop all IO from the initiator some of the nvmet_tcp_wq workers will keep running forever.
The workload shown with "top" never returns to the normal idle level.
root 5669 1.1 0.0 0 0 ? D< 03:39 0:09 [kworker/22:2H+nvmet_tcp_wq]
root 5670 0.8 0.0 0 0 ? D< 03:39 0:06 [kworker/55:2H+nvmet_tcp_wq]
root 5676 0.2 0.0 0 0 ? D< 03:39 0:01 [kworker/29:2H+nvmet_tcp_wq]
root 5677 12.2 0.0 0 0 ? D< 03:39 1:35 [kworker/59:2H+nvmet_tcp_wq]
root 5679 5.7 0.0 0 0 ? D< 03:39 0:44 [kworker/27:2H+nvmet_tcp_wq]
root 5680 2.9 0.0 0 0 ? I< 03:39 0:23 [kworker/57:2H-nvmet_tcp_wq]
root 5681 1.0 0.0 0 0 ? D< 03:39 0:08 [kworker/60:2H+nvmet_tcp_wq]
root 5682 0.5 0.0 0 0 ? D< 03:39 0:04 [kworker/18:2H+nvmet_tcp_wq]
root 5683 5.8 0.0 0 0 ? D< 03:39 0:45 [kworker/54:2H+nvmet_tcp_wq]
The number of running nvmet_tcp_wq will keep increasing once you hit the problem:
gold:/var/crash/2021-08-26-08:38 # ps ax | grep nvmet_tcp_wq | tail -3
41114 ? D< 0:00 [kworker/25:21H+nvmet_tcp_wq]
41152 ? D< 0:00 [kworker/54:25H+nvmet_tcp_wq]
gold:/var/crash/2021-08-26-08:38 # ps ax | grep nvme | grep wq | wc -l
500
gold:/var/crash/2021-08-26-08:38 # ps ax | grep nvme | grep wq | wc -l
502
gold:/var/crash/2021-08-26-08:38 # ps ax | grep nvmet_tcp_wq | wc -l
503
gold:/var/crash/2021-08-26-08:38 # ps ax | grep nvmet_tcp_wq | wc -l
505
gold:/var/crash/2021-08-26-08:38 # ps ax | grep nvmet_tcp_wq | wc -l
506
gold:/var/crash/2021-08-26-08:38 # ps ax | grep nvmet_tcp_wq | wc -l
511
gold:/var/crash/2021-08-26-08:38 # ps ax | grep nvmet_tcp_wq | wc -l
661
Eventually the system runs out of resources.
At some point the system will reach a workload of 2000+ and crash.
So far, I have been unable to determine why the number of nvmet_tcp_wq keeps increasing.
It must be because the current failed worker gets replaced by a new worker without the old being terminated.
Thanks,
Mark Ruijter
More information about the Linux-nvme
mailing list