[PATCH v1 0/4] Add command id quirk for fabrics

Keith Busch kbusch at kernel.org
Tue Nov 9 11:04:32 PST 2021


On Tue, Nov 09, 2021 at 06:59:05PM +0200, Max Gurtovoy wrote:
> 
> On 11/9/2021 6:15 PM, Keith Busch wrote:
> > On Tue, Nov 09, 2021 at 03:31:02PM +0100, Christoph Hellwig wrote:
> > > On Tue, Nov 09, 2021 at 04:23:33PM +0200, Max Gurtovoy wrote:
> > > > On 11/9/2021 3:15 PM, Christoph Hellwig wrote:
> > > > > Max, if you can't point us to a broken target (and yes, it is broken)
> > > > > this will not go anywhere.
> > > > Any target that uses Apple device as backend can be harmed.
> > > > 
> > > > Most simple example is Linux PT target that copy the sqe as-is and passes
> > > > it to the NVMe Apple drive.
> > > Take another close look at how command_id are assigned my Linux driver.
> > > We obviously do not pass it through as that would be completely broken.
> > Also worth noting this driver has always defined the command id as a
> > __u16, not __le16, yet we don't have any bug reports from big-endian
> > hosts.
> 
> Right, my bad. I thought that the pass-through target uses the same id.
> 
> Linux PT target works fine.
> 
> Bad example.
> 
> Linux kernel world is covered but I still think we need to add this ability
> for fabrics controllers as we did for pci controllers.
> 
> There are a lot of vendors out there with their optimizations and solutions
> and by adding some code to cover a broken TCP target (that no one said what
> is this target and why nobody fixed it) by default that hurts others (even
> if it's spec compliant) is not a good practice.

Could you qualify the harm this caused? The command id is just an opaque
cookie; the target should not do any interpretation on it, so this
encoding should be inconsequential from the target's perspective.

There are more hosts than just Linux that may encode id's with flags for
driver use, so non-compliance here is just asking for trouble.

If a vendor wants to constrain the command id for some vendor specific
optimization, they should bring forth a TPar and fight it out in the
workgroup.

We did get bug reports that not validating command id's will crash the
kernel or corrupt data if an unexpected response is observed. Even
though the incorrect id is not the kernel's fault, we generally strive
for resilience against those types of observations in spite of
potentially flaky hardware.



More information about the Linux-nvme mailing list