nvme-tcp: i/o errors and stalled host from failure to send command pdus

Sagi Grimberg sagi at grimberg.me
Wed Aug 31 16:42:06 PDT 2022


> 
> Here's a trace including state from nvme_tcp_try_send_cmd_pdu(), like so.
> 
> req->dbg.send = 1;
> ret = kernel_sendpage(queue->sock, virt_to_page(pdu), offset, len, flags);
> req->dbg.sendpage = ret;
> 
> 
> nvme-trace.out
> 
> tar-1785    [005] .....   232.768892:nvme_tcp_queue_rq:
>   nvme1: qid=6 tag=18 op=0 data_len=20480
> tar-1785    [005] .....   232.768894: nvme_tcp_queue_rq:
>   nvme1: qid=6 tag=17 op=0 data_len=32768
> tar-1785    [005] .N...   232.768895: nvme_tcp_queue_rq:
>   nvme1: qid=6 tag=14 op=0 data_len=32768
> kworker/5:1H-475     [005] .....   232.768923: nvme_tcp_queue_rq:
>   nvme1: qid=6 tag=13 op=0 data_len=32768
> kworker/5:1H-475     [005] .....   232.768924: nvme_tcp_queue_rq:
>   nvme1: qid=6 tag=12 op=0 data_len=12288
> tar-1785    [007] .....   232.769141: nvme_tcp_queue_rq:
>   nvme1: qid=8 tag=69 op=0 data_len=20480
> 
> 
> dmesg.out (reordered to match nvme-trace.out)
> 
> [  262.889536] nvme nvme1: state: tag 0x18 io_cpu 5 smp_id 5 smp_id2 5 sync 1
>   empty 1 lock 1 last 0 more 0 send 0 sendpage -1

Something is not adding up...
lock=1, means that we get into nvme_tcp_send_all(), but somehow we are 
not able to send it to the wire?

I'd look into this and add indication that it is pulled from the
req_list, and in turn from the send_list. It looks like it is somehow
picked out and silteny dropped, because as long as it is on the list(s)
another context should have picked it up at some point...

> [  262.889337] nvme nvme1: state: tag 0x17 io_cpu 5 smp_id 5 smp_id2 5 sync 1
>   empty 1 lock 1 last 0 more 0 send 0 sendpage -1
> [  262.889110] nvme nvme1: state: tag 0x14 io_cpu 5 smp_id 5 smp_id2 7 sync 1
>   empty 1 lock 1 last 0 more 0 send 0 sendpage -1
> [  262.888864] nvme nvme1: state: tag 0x13 io_cpu 5 smp_id 5 smp_id2 5 sync 1
>   empty 0 lock 0 last 0 more 1 send 1 sendpage 72
> [  262.888727] nvme nvme1: state: tag 0x12 io_cpu 5 smp_id 5 smp_id2 5 sync 1
>   empty 0 lock 0 last 1 more 1 send 1 sendpage 72



More information about the Linux-nvme mailing list