[PATCH] nvme-tcp: Do not terminate commands when in RESETTING

Thu Jan 11 06:10:08 PST 2024

On 1/11/24 13:30, Sagi Grimberg wrote:
> 
>> From: Hannes Reinecke <hare at suse.de>
>>
>> Terminating commands from the timeout handler might lead
>> to a data corruption as the timeout might trigger before
>> KATO expired.
>> This is the case when several commands have been started
>> before the keep-alive command and the command timeouts
>> trigger just after the keep-alive command has been sent.
>> Then the first command will trigger an error recovery,
>> but all the other commands will be aborted directly
>> and immediately retried.
>> So return BLK_EH_RESET_TIMER for I/O commands when
>> error recovery has been started to ensure that the
>> commands will be retried only after the KATO interval.
>>
>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>> ---
>>   drivers/nvme/host/tcp.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index b234f0674aeb..b9ec121b3fc6 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -2429,6 +2429,18 @@ static enum blk_eh_timer_return 
>> nvme_tcp_timeout(struct request *rq)
>>            rq->tag, nvme_cid(rq), pdu->hdr.type, opc,
>>            nvme_opcode_str(qid, opc, fctype), qid);
>> +    /*
>> +     * If the error recovery is started we should ignore all
>> +     * I/O commands as they'll be aborted once error recovery starts.
>> +     * Otherwise they'll be failed over immediately and might
>> +     * cause data corruption.
> 
> I think that we can settle with a simpler comment:
>      /*
>       * If the controller is in state RESETTING, the error
>       * recovery handler will cancel all inflight commands soon
>       * which in turn may failover to a different path.
>       */
> 
> The error handler will start promptly anyways.
> 
Ok.

>> +     */
>> +    if (ctrl->state == NVME_CTRL_RESETTING && qid > 0) {
> 
> What would be the issue doing this for admin commands as well?
> 
There are several commands sent during the 'connect' workflow before
the queue is switched to 'LIVE', and we need to terminate them correctly
such that the 'connect' workflow can make progress (and we're not
deadlocking waiting for the commands to complete).
(Cf the comment after the 'LIVE' check later in the code)

Plus all commands on the admin queue do not change the state of the
data, so it doesn't matter if they got re-issued to a different
controller. Well, it _might_ matter, but then we'll be getting
an NVMe status back and we'll be notified in the message log.

Core issue with this patch is that we want to avoid a silent
failover of data I/O before KATO expires.

>> +        /* Avoid interfering with firmware download */
>> +        if (!WARN_ON(work_pending(&ctrl->fw_act_work)))
> 
> Let's ignore the fw_act_work, it's not a thing in fabrics.
> 
Wasn't sure, but if you say so, ok.

>> +            return BLK_EH_RESET_TIMER;
>> +    }
>> +
>>       if (ctrl->state != NVME_CTRL_LIVE) {
>>           /*
>>            * If we are resetting, connecting or deleting we should
> 
> I think that the comment here needs an update as well.
> 
Oh, I thought that I had modified that, too. But yes, of course.

> BTW, what happens when deleting a controller? Won't the state be
> DELETING and the situation you are trying to prevent happen as well?

Not sure. Haven't really tested that path, and it'll be hard to trigger
anyway (you would have to trigger controller deletion _just_ before the
I/O timeout hits).
But yeah, I could check for DELETING, too.

Cheers,

Hannes