nvme-tcp: i/o errors and stalled host from failure to send command pdus

Jonathan Nicklin jnicklin at blockbridge.com
Thu Sep 1 15:56:18 PDT 2022


On Thu, Sep 1, 2022 at 1:46 PM Jonathan Nicklin
<jnicklin at blockbridge.com> wrote:
>
> nvme_tcp_queue_more() reads queue->mode_requests. So, I supposed
> nvme_tcp_queue_request() is an example where queue->more_requests gets
> read without the send_mutex? queue_work_on() is conditionalized on this.
>
> Apologies in advance if I'm misunderstanding the code.
>
> On Thu, Sep 1, 2022 at 1:02 PM Sagi Grimberg <sagi at grimberg.me> wrote:
> >
> >
> > > Question: queue->more_requests is updated under a lock. However, not
> > > all readers of queue->more_requests acquire the lock. The smp_id/smp_id2
> > > values seem to show that the same queue was being processed on
> > > different CPUs. Is this a race between the kworker and inline
> > > submission?
> >
> > The entire send path in io_work is wrapped with the send_mutex. Which
> > part is not protected with the lock?

Okay, maybe some progress?! What appears to be happening is that
the request marked "last" is being picked up via nvme_tcp_send_all() when
processing the previous command. Consider another trace:

                 CPU
tar-2263    [004] .....   680.671184: nvme_tcp_queue_rq: nvme1: qid=5
tag=5e op=0 data_len=16384
tar-2263    [004] .....   680.671185: nvme_tcp_cmd_pdu: nvme1: qid=5
tag=5e op=0 page_offset=1352 send_len=72
tar-2263    [004] .....   680.671186: nvme_tcp_queue_rq: nvme1: qid=5
tag=5d op=0 data_len=32768
tar-2263    [004] .....   680.671186: nvme_tcp_cmd_pdu: nvme1: qid=5
tag=5d op=0 page_offset=1424 send_len=72
tar-2263    [004] .....   680.671186: nvme_tcp_queue_rq: nvme1: qid=5
tag=5c op=0 data_len=32768
tar-2263    [004] .....   680.671187: nvme_tcp_cmd_pdu: nvme1: qid=5
tag=5c op=0 page_offset=1496 send_len=72
tar-2263    [004] .N...   680.671188: nvme_tcp_queue_rq: nvme1: qid=5
tag=5b op=0 data_len=32768
tar-2263    [004] .N...   680.671189: nvme_tcp_cmd_pdu: nvme1: qid=5
tag=5b op=0 page_offset=1568 send_len=72
kworker/4:0H-38 [004] .....   680.671198: nvme_tcp_queue_rq: nvme1:
qid=5 tag=5a op=0 data_len=16384
tar-2263        [000] .....   680.671199: nvme_tcp_cmd_pdu: nvme1:
qid=5 tag=5a op=0 page_offset=1640 send_len=72

I've added some additional state to the request to print out on
timeout. First, queue->more_requests now tracks the command tag
that asserted more_requests. Second, I recorded the number of commands
issued via nvme_tcp_send_all(). As you can see, when tag 0x5b was
issued, it ended up sending 0x5a as well. More_requests was asserted
by 0x5b when nvme_tcp_try_send_cmd_pdu() processed 0x5a, so we never
actually pushed it out onto the wire.

[  709.457051] nvme nvme1: state: tag 0x5e io_cpu 4 smp_id 4 smp_id2 4 sync 1
 empty 1 lock 1 last 0 more 0 send 1 send_ret 72 send_flags 0x28040
q_send_list 0
 q_req_list 0 q_more_requests 0x5e req_sent 1
[  709.414327] nvme nvme1: state: tag 0x5d io_cpu 4 smp_id 4 smp_id2 4 sync 1
 empty 1 lock 1 last 0 more 0 send 1 send_ret 72 send_flags 0x28040
q_send_list 0
 q_req_list 0 q_more_requests 0x5d req_sent 1
[  709.371665] nvme nvme1: state: tag 0x5c io_cpu 4 smp_id 4 smp_id2 4 sync 1
 empty 1 lock 1 last 0 more 0 send 1 send_ret 72 send_flags 0x28040
q_send_list 0
 q_req_list 0 q_more_requests 0x5c req_sent 1
[  709.312298] nvme nvme1: state: tag 0x5b io_cpu 4 smp_id 4 smp_id2 0 sync 1
 empty 1 lock 1 last 0 more 0 send 1 send_ret 72 send_flags 0x28040
q_send_list 0
 q_req_list 0 q_more_requests 0x5b req_sent 2 !!!
[  709.283534] nvme nvme1: state: tag 0x5a io_cpu 4 smp_id 4 smp_id2 4 sync 1
 empty 0 lock 0 last 1 more 1 send 1 send_ret 72 send_flags 0x28040
q_send_list 0
 q_req_list 0 q_more_requests 0x5b !!! req_sent 0



More information about the Linux-nvme mailing list