[PATCH] nvme-tcp: handle failing req immediately for REQ_FAILFAST_DRIVER

Sagi Grimberg sagi at grimberg.me
Thu Apr 7 06:42:55 PDT 2022


>>>> I'm wandering if this will race with nvme_cancel_admin_tagset which
>>>> can also complete the command... The whole assumption is that the
>>>> cancel iterator will be running as the socket is shutdown by the peer
>>>> and error_recovery was triggered.
>>>>
>>>> I think these shouldn't race because we shutdown the queue (and flush
>>>> the io_work) and only then cancel. But I've seen races of that
>>>> sort in the past...
>>>
>>> This particular request is issued by nvme_shutdown_ctrl() so we know
>>> exactly where we are. But this might not be the case for any other
>>> REQ_FAILFAST_DRIVER flagged request. So this might introduce the race
>>> you are pointing out.
>>>
>>> The option is see is what Margin suggested in
>>>
>>> http://lists.infradead.org/pipermail/linux-nvme/2021-September/027867.html
>>
>> Don't know if I like the state check. Why this does not apply in a ctrl
>> reset?
> 
> Understood. Would something like this work
> 
>        struct request *rq = blk_mq_rq_from_pdu(queue->request);
>        bool queue_ready = test_bit(NVME_TCP_Q_LIVE, &queue->flags);
> 
>        dev_err(queue->ctrl->ctrl.device,
>                "failed to send request %d\n", ret);
>        if ((ret != -EPIPE && ret != -ECONNRESET) ||
>            (!queue_ready && rq->cmd_flags & REQ_FAILFAST_DRIVER))
>                nvme_tcp_fail_request(queue->request);
> 
> ?

But in your case the queue is live...

I'm thinking that we maybe need a register access timeout value for
fabrics...



More information about the Linux-nvme mailing list