[PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation

Hannes Reinecke hare at suse.de
Wed May 26 01:48:35 PDT 2021


On 5/26/21 10:41 AM, Sagi Grimberg wrote:
> 
>>>> The bad controller should be fixed.
>>>>
>>>> In the past, I've sent patches that check that sqid match in nvme
>>>> cqe to
>>>> protect faulty drives that might send
>>>> the completion on a wrong msix.
>>>>
>>>> My patch wasn't accepted since it added an additional "if" in the
>>>> fast path.
>>>>
>>>> Now we're adding much more operation in the fast path because of
>>>> buggy ctrl
>>>> ?
>>> I shared the same performance concern on v1 on this series. I haven't
>>> been able to test this one yet (only have emulation for two more weeks).
>>>
>>> Hannes says the bug this catches happens frequently enough on TCP. If we
>>> don't catch it, we get kernel panic or corruption, so we defintely
>>> need to
>>> do something. Sagi correctly noted this type of bug is not unique to TCP
>>> (or even NVMe, for that matter), but if there is a performance impact on
>>> PCI, and no one so far reports such an issue, I would still recommend
>>> this type of mitigation be isolated to transports that actually observe
>>> invalid CQEs.
>>>   
>>
>> Please do that if possible.
> 
> I disagree with that. The original report was about TCP, but that is
> pure coincidence as far as I'm concerned, as I indicated before there
> is nothing TCP specific about this.
> 
> Today we don't protect against such a scenario. I originally dismissed
> this issue as we should probably address the a buggy controller, but an
> argument was made that a buggy controller should not be able to crash
> the kernel and we should have a safe-guard for it, which is a valid
> argument, but this is not the worst that can happen, this could have
> caused a silent data corruption with a slightly different timing. So
> it seems absolutely appropriate to me considering the possible
> consequences.
> 
> The way I see it, we should decide where we stand here, either we
> continue to ignore this possible panic/data-corruption (which I
> personally think would be wrong), or we address it, and if we do,
> we should do it in the correct place, not limit it to the observed
> component.
> 
I am perfectly fine with extending this to all transports; the prime
reason why we haven't seen it is because other transports like RDMA or
FC are way more encapsulated, making it harder to receive invalid packets.
(Lossless fabrics and all that).
TCP is far more loosely structured, allowing for invalid packets are
more easily.
But really this issue is present in all transports, so we should be
addressing it on a general level.
(And it's only a matter of time until someone builds cheapo NVMe-pci
devices having similar issues :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare at suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)



More information about the Linux-nvme mailing list