[PATCH v2] nvmet: force reconnect when number of queue changes

Knight, Frederick Frederick.Knight at netapp.com
Thu Oct 6 13:54:45 PDT 2022


YES!  100%

Don't retry (DNR) - IF you are resending exactly the same command with exactly the same parameters, it is likely to fail for the same reason it did the first time. If you change the commands parameters (maybe because you're trying to fix whatever caused the failure), then DNR doesn't apply.

	Fred

> -----Original Message-----
> From: James Smart <jsmart2021 at gmail.com>
> Sent: Thursday, October 6, 2022 4:15 PM
> To: Sagi Grimberg <sagi at grimberg.me>; Daniel Wagner
> <dwagner at suse.de>
> Cc: Knight, Frederick <Frederick.Knight at netapp.com>; linux-
> nvme at lists.infradead.org; Shinichiro Kawasaki
> <shinichiro.kawasaki at wdc.com>; hare at suse.de
> Subject: Re: [PATCH v2] nvmet: force reconnect when number of queue
> changes
> 
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
> 
> 
> 
> 
> On 10/6/2022 4:37 AM, Sagi Grimberg wrote:
> >
> >>>> As far I can tell, what's is missing from a testing point of view
> >>>> is the ability to fail requests without the DNR bit set or the
> >>>> ability to tell the host to reconnect. Obviously, an AEN would be
> >>>> nice for this but I don't know if this is reason enough to extend
> >>>> the spec.
> >>>
> >>> Looking into the code, its the connect that fails on invalid
> >>> parameter with a DNR, because the host is attempting to connect to a
> >>> subsystems that does not exist on the port (because it was taken
> >>> offline for maintenance reasons).
> >>>
> >>> So I guess it is valid to allow queue change without removing it
> >>> from the port, but that does not change the fundamental question on
> DNR.
> >>> If the host sees a DNR error on connect, my interpretation is that
> >>> the host should not retry the connect command itself, but it
> >>> shouldn't imply anything on tearing down the controller and giving
> >>> up on it completely, forever.
> >>
> >> Okay, let me try to avoid the DNR discussion for now and propose
> >> something else? What about adding a 'enable' attribute to the subsys?
> >>
> >> The snipped below does the trick. Though There is no explicit
> >> synchronization between host and target, so it's possible the host
> >> doesn't notice that the subsystem toggled enabled and updated the
> >> number queues. But not sure if it's worth to address this, it feels a
> >> bit over-engineered.
> >
> > I think that for the matter of this patch, you can keep force reconnect.
> >
> > But I still think we need to be consistent with the different
> > transports on how we interperet controller returning DNR...
> >
> 
> I agree - behavior should be the same regardless of transport.  DNR is pretty
> specific in its definition - "If the same command is re-submitted to any
> controller in the NVM subsystem, then that re-submitted command is
> expected to fail"
> 
> But, what we forget is "the command" is actually "command X with fields set
> this way".  If we change the fields, it may actually succeed.
> 
> So if we're re-issuing Connect in the same way without any changing
> values, we're better off not reconencting.   But if Connect changes,
> we're back to ground 0.
> 
> -- james



More information about the Linux-nvme mailing list