[PATCH v1 0/4] Add command id quirk for fabrics

Keith Busch kbusch at kernel.org
Thu Nov 11 09:36:28 PST 2021


On Thu, Nov 11, 2021 at 11:29:11AM +0200, Max Gurtovoy wrote:
> 1. The device that broke the spec in the first place was that device for
> which caused you to add the gen bits to CID.
> 
> 2. These gen bits are causing the limit of 4K Q_depth.

That is true. At least one person (Bart) didn't like that.

On the other hand, we do observe timeouts in production environments
because the queues are already too large, and there is simply not enough
bandwidth available to satisfy default host latency requirements at
current depths. Going deeper doesn't improve iops or throughput, but it
does increase tail latencies.
 
> 3. It's not mention anywhere in the spec, and if it was intended to be
> implemented like it's now - it would have mentioned in the spec.

The only thing the spec says is that host software must ensure the CID
is unique on the SQ for the lifetime of the command. There's no need for
the spec to cover every possible way a driver can satisfy that
requirement.

> 4. Since gen bits were introduced, other devices got broken (such as Apple),
> hence the quirk for PCI.

Apple doesn't care about spec compatibility though. They support their
hardware only with their own host software, hence all the apple specific
Linux quirks.

> 5. The gen bits adds "if" conditions and logic to the fast path for
> "innosent" transports.

The quirk doesn't change that, though.

> 6. This series just extends this quirk for fabrics.

My only request has been that you identify at least one real fabrics
target that behaves this way. So far it sounds like this series is
speculating they might exist.

And if they do exist, you will have to change all the __u16 command_id
to __le16 and use appropriate cpu_to_le16() wrappers, otherwise
big-endian machines will produce completely different command id's than
little-endian. Since we haven't heard a single bug report from
big-endian hosts, I am pretty sure that such target behavior doesn't
actually exist in the field.

> 7. Even if not broken, some devices may suffer from reduced
> performance having CID space spanning all 16 possible bit - fact that
> we ignore

I don't see how. Why would this matter to any target? It's just a
cookie to pass back.
 
> 8. This series provides a flag to disable default behavior per
> connection.
> 
> 9. This series doesn't add any logic to fast path.
> 
> 10. My patch from last year for resiliency for nvme_pci was rejected
> because
> it added one if condition to the fast path - no consistency.

It's based only on what anyone could measure on available hardware. I
have completely different machines than what I tested at that time. If
someone can measure a loss with the current driver, that may change the
discussion. 



More information about the Linux-nvme mailing list