[PATCH 1/3] nvme-tcp: Do not terminate commands when in RESETTING

Sagi Grimberg sagi at grimberg.me
Tue Sep 12 05:15:45 PDT 2023


>>> When the error recovery started it will terminate all commands
>>> anyway, so we should just return BLK_EH_RESET_TIMER for a
>>> command timeout.
>>
>> Does this fix a real issue? the timeout handler and the error
>> recovery should be mutually exclusive.
>>
> They are not. When several timeouts are triggering at the same time,
> only the first will start error recovery; the remaining ones will
> still hit the timeout function until cancel_tagset() is called.

Yes, but they do not access any resources unsynchronized. Meaning
is this a use-after-free issue?

> 
>>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>>> ---
>>>   drivers/nvme/host/tcp.c | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>> index 5b332d9f87fc..1faef1cf5c94 100644
>>> --- a/drivers/nvme/host/tcp.c
>>> +++ b/drivers/nvme/host/tcp.c
>>> @@ -2274,6 +2274,14 @@ static enum blk_eh_timer_return 
>>> nvme_tcp_timeout(struct request *rq)
>>>           nvme_tcp_queue_id(req->queue), nvme_cid(rq), pdu->hdr.type,
>>>           opc, nvme_opcode_str(qid, opc, fctype));
>>> +    /*
>>> +     * If the error recovery is started all commands will be
>>> +     * aborted anyway, and nothing is to be done here.
>>> +     */
>>> +    if (ctrl->state == NVME_CTRL_RESETTING &&
>>> +        work_pending(&to_tcp_ctrl(ctrl)->err_work))
>>> +        return BLK_EH_RESET_TIMER;
>>
>> What does the work_pending() check give you here?
>>
> 'RESETTING' is also entered for firmware download, so we need to check
> for work_pending() to find out if the error recovery engaged.

firmware download? what happens if err_work is yet to run when
this request times out?

Not sure what this is protecting against.



More information about the Linux-nvme mailing list