[PATCH v1 0/4] Add command id quirk for fabrics

Max Gurtovoy mgurtovoy at nvidia.com
Thu Nov 11 01:29:11 PST 2021


On 11/10/2021 9:45 PM, Sagi Grimberg wrote:
> Hey, sorry for the late chime here, ramping up on some emails.
>
>>>>>>> Max, if you can't point us to a broken target (and yes, it is 
>>>>>>> broken)
>>>>>>> this will not go anywhere.
>>>>>> Any target that uses Apple device as backend can be harmed.
>>>>>>
>>>>>> Most simple example is Linux PT target that copy the sqe as-is 
>>>>>> and passes
>>>>>> it to the NVMe Apple drive.
>>>>> Take another close look at how command_id are assigned my Linux 
>>>>> driver.
>>>>> We obviously do not pass it through as that would be completely 
>>>>> broken.
>>>> Also worth noting this driver has always defined the command id as a
>>>> __u16, not __le16, yet we don't have any bug reports from big-endian
>>>> hosts.
>>>
>>> Right, my bad. I thought that the pass-through target uses the same id.
>>>
>>> Linux PT target works fine.
>>>
>>> Bad example.
>>>
>>> Linux kernel world is covered but I still think we need to add this 
>>> ability
>>> for fabrics controllers as we did for pci controllers.
>>>
>>> There are a lot of vendors out there with their optimizations and 
>>> solutions
>>> and by adding some code to cover a broken TCP target (that no one 
>>> said what
>>> is this target and why nobody fixed it) by default that hurts others 
>>> (even
>>> if it's spec compliant) is not a good practice.
>
> Completely disagree here. The TCP original report was just an example of
> lack of protection we have against spurious completions. Nothing
> specific about nvme-tcp here, this was discussed and agreed on in
> the original report.
>
You are ignoring the facts:

1. The device that broke the spec in the first place was that device for 
which caused you to add the gen bits to CID.

2. These gen bits are causing the limit of 4K Q_depth.

3. It's not mention anywhere in the spec, and if it was intended to be 
implemented like it's now - it would have mentioned in the spec.

4. Since gen bits were introduced, other devices got broken (such as 
Apple), hence the quirk for PCI.

5. The gen bits adds "if" conditions and logic to the fast path for 
"innosent" transports.

6. This series just extends this quirk for fabrics.

7. Even if not broken, some devices may suffer from reduced performance 
having CID space spanning all 16 possible bit - fact that we ignore

8. This series provides a flag to disable default behavior per connection.

9. This series doesn't add any logic to fast path.

10. My patch from last year for resiliency for nvme_pci was rejected 
because it added one if condition to the fast path - no consistency.


>> Could you qualify the harm this caused? The command id is just an opaque
>> cookie; the target should not do any interpretation on it, so this
>> encoding should be inconsequential from the target's perspective.
>
> Exactly, the command id is an opaque that is solely up to the host
> discretion in terms of how to use it. It's pure coincidence that Linux
> uses it for command indexes.
>
> Any implementation that interprets command ids to _anything_ needs
> a quirk, not the other way around.
>
>> There are more hosts than just Linux that may encode id's with flags for
>> driver use, so non-compliance here is just asking for trouble.
>
> I know of at least one significant host implementation where command
> ids are not indexes.
>
>> If a vendor wants to constrain the command id for some vendor specific
>> optimization, they should bring forth a TPar and fight it out in the
>> workgroup.
>>
>> We did get bug reports that not validating command id's will crash the
>> kernel or corrupt data if an unexpected response is observed. Even
>> though the incorrect id is not the kernel's fault, we generally strive
>> for resilience against those types of observations in spite of
>> potentially flaky hardware.
>
> Agreed.



More information about the Linux-nvme mailing list