[PATCH 1/3] nvme-tcp: Do not terminate commands when in RESETTING
Sagi Grimberg
sagi at grimberg.me
Tue Sep 12 05:15:45 PDT 2023
>>> When the error recovery started it will terminate all commands
>>> anyway, so we should just return BLK_EH_RESET_TIMER for a
>>> command timeout.
>>
>> Does this fix a real issue? the timeout handler and the error
>> recovery should be mutually exclusive.
>>
> They are not. When several timeouts are triggering at the same time,
> only the first will start error recovery; the remaining ones will
> still hit the timeout function until cancel_tagset() is called.
Yes, but they do not access any resources unsynchronized. Meaning
is this a use-after-free issue?
>
>>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>>> ---
>>> drivers/nvme/host/tcp.c | 8 ++++++++
>>> 1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>> index 5b332d9f87fc..1faef1cf5c94 100644
>>> --- a/drivers/nvme/host/tcp.c
>>> +++ b/drivers/nvme/host/tcp.c
>>> @@ -2274,6 +2274,14 @@ static enum blk_eh_timer_return
>>> nvme_tcp_timeout(struct request *rq)
>>> nvme_tcp_queue_id(req->queue), nvme_cid(rq), pdu->hdr.type,
>>> opc, nvme_opcode_str(qid, opc, fctype));
>>> + /*
>>> + * If the error recovery is started all commands will be
>>> + * aborted anyway, and nothing is to be done here.
>>> + */
>>> + if (ctrl->state == NVME_CTRL_RESETTING &&
>>> + work_pending(&to_tcp_ctrl(ctrl)->err_work))
>>> + return BLK_EH_RESET_TIMER;
>>
>> What does the work_pending() check give you here?
>>
> 'RESETTING' is also entered for firmware download, so we need to check
> for work_pending() to find out if the error recovery engaged.
firmware download? what happens if err_work is yet to run when
this request times out?
Not sure what this is protecting against.
More information about the Linux-nvme
mailing list