[PATCH v3 0/4] nvme: improve error handling and ana_state to work well with dm-multipath
Mike Snitzer
snitzer at redhat.com
Sat May 1 16:19:28 BST 2021
On Sat, May 01 2021 at 7:58am -0400,
Hannes Reinecke <hare at suse.de> wrote:
> On 4/20/21 5:46 PM, Laurence Oberman wrote:
> [ .. ]
> >
> >Let me add some reasons why as primarily a support person that this is
> >important and try avoid another combative situation.
> >
> >Customers depend on managing device-mapper-multipath the way it is now
> >even with the advent of nvme-over-F/C. Years of administration and
> >management for multiple Enterprise O/S vendor customers (Suse/Red Hat,
> >Oracle) all depend on managing multipath access in a transparent way.
> >
> >I respect everybody's point of view here but native does change log
> >alerting and recovery and that is what will take time for customers to
> >adopt.
> >
> >It is going to take time for Enterprise customers to transition so all
> >we want is an option for them. At some point they will move to native
> >but we always like to keep in step with upstream as much as possible.
> >
> >Of course we could live with RHEL-only for while but that defeats our
> >intention to be as close to upstream as possible.
> >
> >If we could have this accepted upstream for now perhaps when customers
> >are ready to move to native only we could phase this out.
> >
> >Any technical reason why this would not fly is of course important to
> >consider but perhaps for now we have a parallel option until we dont.
> >
> Curiously, we (as in we as SLES developers) have found just the opposite.
> NVMe is a new technology, and out of necessity there will not be any
> existing installations where we have to be compatible with.
> We have switched to native NVMe multipathing with SLE15, and decided
> to educate customers that NVMe is a different concept than SCSI, and
> one shouldn't try treat both the same way.
As you well know: dm-multipath was first engineered to handle SCSI, and
it was too tightly coupled at the start, but the scsi_dh interface
provided sorely missing abstraction. With NVMe, dm-multipath was
further enhanced to not do work only needed for SCSI.
Seems SUSE has forgotten that dm-multipath has taken strides to properly
abstract away SCSI specific details, at least this patchset forgot it
(because proper layering/abstraction is too hard? that mantra is what
gave native NVMe multipath life BTW):
https://patchwork.kernel.org/project/dm-devel/list/?series=475271
Long story short, there is utility in dm-multipath being transport
agnostic with specialized protocol specific bits properly abstracted.
If you or others don't care about any of that anymore, that's fine! But
it doesn't mean others don't. Thankfully both can exist perfectly fine,
sadly that clearly isn't possible without absurd tribal fighting (at
least in the context of NVMe).
And to be clear Hannes: your quick review of this patchset couldn't have
been less helpful or informed. Yet it enabled NVMe maintainers to ignore
technical review (you gave them cover).
The lack of proper technical review of this patchset was astonishing but
hch's dysfunctional attack that took its place really _should_ concern
others. Seems it doesn't, must be nice to not have a dog in the fight
other than philosophical ideals that enable turning a blind eye.
> This was helped by the
> fact the SLE15 is a new release, so customers were accustomed to
> having to change bits and pieces in their infrastructure to support
> new releases.
Sounds like you either have very few customers and/or they don't use
layers that were engineered with dm-multipath being an integral layer
in the IO stack. That's fine, but that doesn't prove anything other
than your limited experience.
> Overall it worked reasonably well; we sure found plenty of bugs, but
> that was kind of expected, and for bad or worse nearly all of them
> turned out to be upstream issues. Which was good for us (nothing
> beats being able to blame things on upstream, if one is careful to
> not linger too much on the fact that one is part of upstream); and
> upstream these things will need to be fixed anyway.
> So we had a bit of a mixed experience, but customers seemed to be
> happy enough with this step.
>
> Sorry about that :-)
Nothing to be sorry about, good on you and the others at SUSE
engineering for improving native NVMe multipathing. Red Hat supports it
too, so your and others' efforts are appreciated there.
Mike
More information about the Linux-nvme
mailing list