[PATCH] nvme-host: tcp: do not read from the socket if we are performing a reset

Tue May 10 11:28:16 PDT 2022

Hey John, Sorry for the late response.

> 
> This bug has been reproduced on a partner's testbed.  We don't
> have direct access to the tests they are running but it appears
> they are running error insertion tests which aggressively
> bring the TCP link up and down, reset the NVMe/TCP controller,
> and randomly add/remove NVMe namespaces from the controller
> while IO is in progress. With these tests they have been able
> to reproduce this bug with our v5.16 based kernel.
> 
> I do see a number of patches from v5.17 and v5.18 which
> look like they might affect this problem.  Specifically:
> 
> 63573807b27e0faf8065a28b1bbe1cbfb23c0130 nvme-tcp: fix bogus request 
> completion when failing to send AER
> d6d6742772d712ed2238f5071b96baf4924f5fad nvme: fix RCU hole that allowed 
> for endless looping in multipath round robin
> a4a6f3c8f61c3cfbda4998ad94596059ad7e4332 nvme-multipath: fix hang when 
> disk goes live over reconnect
> 
> This is the testbed that reproduced the RCU hole fixed in 
> d6d6742772d712ed2238f5071b96baf4924f5fad.
> We should definitely make sure they are running with that fix before we 
> go any further.
> 
> If you agree the above patches are needed we can give them a new 
> v5.18-rc6 based kernel
> and ask them to run their test again.

These are all useful and needed fixes, but please see the fix I've
referenced below as this issue looks very similar to what this patch
fixes.

> 
> /John
> 
>>>
>>> Signed-off-by: Maurizio Lombardi <mlombard at redhat.com>
>>> ---
>>>   drivers/nvme/host/tcp.c | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>> index ad3a2bf2f1e9..e3ef014bbc0b 100644
>>> --- a/drivers/nvme/host/tcp.c
>>> +++ b/drivers/nvme/host/tcp.c
>>> @@ -103,6 +103,7 @@ enum nvme_tcp_queue_flags {
>>>       NVME_TCP_Q_ALLOCATED    = 0,
>>>       NVME_TCP_Q_LIVE        = 1,
>>>       NVME_TCP_Q_POLLING    = 2,
>>> +    NVME_TCP_Q_CONNECTING    = 3,
>>>   };
>>>   enum nvme_tcp_recv_state {
>>> @@ -1213,6 +1214,10 @@ static void nvme_tcp_io_work(struct 
>>> work_struct *w)
>>>           bool pending = false;
>>>           int result;
>>> +        if (unlikely(!test_bit(NVME_TCP_Q_LIVE, &queue->flags) &&
>>> +                !test_bit(NVME_TCP_Q_CONNECTING, &queue->flags)))
>>> +            return;
>>
>> How is this protecting anything? If the queue is torn down we may access
>> other freed memory, not just from accessing the sockets.
>>
>>> +
>>>           if (mutex_trylock(&queue->send_mutex)) {
>>>               result = nvme_tcp_try_send(queue);
>>>               mutex_unlock(&queue->send_mutex);
>>> @@ -1670,6 +1675,8 @@ static int nvme_tcp_start_queue(struct 
>>> nvme_ctrl *nctrl, int idx)
>>>       struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
>>>       int ret;
>>> +    set_bit(NVME_TCP_Q_CONNECTING, &ctrl->queues[idx].flags);
>>> +
>>>       if (idx)
>>>           ret = nvmf_connect_io_queue(nctrl, idx);
>>>       else
>>> @@ -1683,6 +1690,7 @@ static int nvme_tcp_start_queue(struct 
>>> nvme_ctrl *nctrl, int idx)
>>>           dev_err(nctrl->device,
>>>               "failed to connect queue: %d ret=%d\n", idx, ret);
>>>       }
>>> +    clear_bit(NVME_TCP_Q_CONNECTING, &ctrl->queues[idx].flags);
>>>       return ret;
>>>   }
>>
>> Question, can you tell from your stack dump if this queue is the admin
>> queue or I/O queue?
>>
>> If this is indeed the admin queue, please check if you have a related
>> fix applied:

This is the possible fix patch:

>> 0fa0f99fc84e ("nvme: fix a possible use-after-free in controller reset 
>> during load")