[PATCH v1 0/4] Add command id quirk for fabrics

Sagi Grimberg sagi at grimberg.me
Wed Nov 10 11:45:17 PST 2021


Hey, sorry for the late chime here, ramping up on some emails.

>>>>>> Max, if you can't point us to a broken target (and yes, it is broken)
>>>>>> this will not go anywhere.
>>>>> Any target that uses Apple device as backend can be harmed.
>>>>>
>>>>> Most simple example is Linux PT target that copy the sqe as-is and passes
>>>>> it to the NVMe Apple drive.
>>>> Take another close look at how command_id are assigned my Linux driver.
>>>> We obviously do not pass it through as that would be completely broken.
>>> Also worth noting this driver has always defined the command id as a
>>> __u16, not __le16, yet we don't have any bug reports from big-endian
>>> hosts.
>>
>> Right, my bad. I thought that the pass-through target uses the same id.
>>
>> Linux PT target works fine.
>>
>> Bad example.
>>
>> Linux kernel world is covered but I still think we need to add this ability
>> for fabrics controllers as we did for pci controllers.
>>
>> There are a lot of vendors out there with their optimizations and solutions
>> and by adding some code to cover a broken TCP target (that no one said what
>> is this target and why nobody fixed it) by default that hurts others (even
>> if it's spec compliant) is not a good practice.

Completely disagree here. The TCP original report was just an example of
lack of protection we have against spurious completions. Nothing
specific about nvme-tcp here, this was discussed and agreed on in
the original report.

> Could you qualify the harm this caused? The command id is just an opaque
> cookie; the target should not do any interpretation on it, so this
> encoding should be inconsequential from the target's perspective.

Exactly, the command id is an opaque that is solely up to the host
discretion in terms of how to use it. It's pure coincidence that Linux
uses it for command indexes.

Any implementation that interprets command ids to _anything_ needs
a quirk, not the other way around.

> There are more hosts than just Linux that may encode id's with flags for
> driver use, so non-compliance here is just asking for trouble.

I know of at least one significant host implementation where command
ids are not indexes.

> If a vendor wants to constrain the command id for some vendor specific
> optimization, they should bring forth a TPar and fight it out in the
> workgroup.
> 
> We did get bug reports that not validating command id's will crash the
> kernel or corrupt data if an unexpected response is observed. Even
> though the incorrect id is not the kernel's fault, we generally strive
> for resilience against those types of observations in spite of
> potentially flaky hardware.

Agreed.



More information about the Linux-nvme mailing list