[PATCH] nvme-tcp: Check if request has started before processing it

Mon Feb 15 16:29:45 EST 2021

>>>>>> blk_mq_tag_to_rq() will always return a request if the command_id is
>>>>>> in the valid range. Check if the request has been started. If we
>>>>>> blindly process the request we might double complete a request which
>>>>>> can be fatal.
>>>>>
>>>>> How did you get to this one? did the controller send a completion for
>>>>> a completed/bogus request?
>>>>
>>>> If that is the case, then that must mean it's possible the driver could
>>>> have started the command id just before the bogus completion check. Data
>>>> iorruption, right?
> 
> 'during TCP LIF toggles and aggr relocates' testing the host
> crashes. TBH, I do not really know what is happening or what the test
> does. Still trying to figure out what's going on.

Well, I think we should probably figure out why that is happening first.

> I was just very
> surprised how much the code trusts the other side to behave correctly./

What does pci/rdma/fc does differently? What does scsi do here?

>>> Yes, which is why I don't think this check is very useful..
>>
>> I actually view that as a valid protection against spoofed frames.
>> Without it it's easy to crash the machine by injecting fake completions with
>> random command ids.
> 
> In this test scenario it's not even a spoofed frame; maybe just a
> confused controller.

Maybe... I am still not sure how this patch helps here though...