[PATCH 00/35] RFC: add "nvme monitor" subcommand

Hannes Reinecke hare at suse.de
Thu Feb 4 02:52:04 EST 2021


On 1/29/21 9:27 PM, Martin Wilck wrote:
> On Fri, 2021-01-29 at 12:08 -0800, Sagi Grimberg wrote:
>>
>>>>>
>>>>> This method for discovery and autodetection has some advantages
>>>>> over the
>>>>> current udev-rule based approach:
>>>>>
>>>>>     * By using the `--persistent` option, users can easily
>>>>> control
>>>>> whether
>>>>>       persistent discovery controllers for discovered transport
>>>>> addresses should
>>>>>       be created and monitored for AEN events. **nvme monitor**
>>>>> watches known
>>>>>       transport addresses, creates discovery controllers as
>>>>> required,
>>>>> and re-uses
>>>>>       existing ones if possible.
>>>>
>>>> What does that mean?
>>>
>>> In general, if the monitor detects a new host_traddr/traddr/trsvcid
>>> tuple, it runs a discovery on it, and keeps the discovery
>>> controller
>>> open if --persistent was given. On startup, it scans existing
>>> controllers, and if it finds already existing discovery
>>> controllers,
>>> re-uses them. These will not be shut down when the monitor exits.
>>
>> And if it doesn't run with --persistent it deletes them? even if
>> these
>> weren't created by it?
> 
> No, not those. That's what I wanted to express with "These will not be
> shut down when the monitor exits."
> 
> It's not trivial to determine reliably which controller the monitor
> created and which it didn't. The current code assumes that all
> discovery controllers that didn't exist at startup were created by the
> monitor. We can improve the intelligence of the tool in that area, of
> course. Currently it makes this dumb assumption.
> 
>>
>> And, if none exist where does it get new discovery controller
>> details?
>> Today we use discovery.conf for that.
> 
> As I said, this can be done with a separate "nvme connect-all" call,
> which could be run from an ExecStartPre or from a separate service.
> Or we can integrate the functionality in the monitor. I don't have a
> strong opinion either way. I can adapt to what you guys prefer.
> 
>>
>>> This allows users fine-grained control about what discovery
>>> controllers
>>> to operate persistently. Users who want all discovery controllers
>>> to be
>>> persistent just use --persistent. Others can set up those that they
>>> want to have (manually or with a script), and not use --persistent.
>>>
>>> The background is that hosts may not need every detected discovery
>>> controller to be persistent. In multipath scenarios, you may see
>>> more
>>> discovery subsystems than anything else, and not everyone likes
>>> that.
>>> That's a generic issue and unrelated to the monitor, but running
>>> the
>>> monitor with --persistent creates discovery controllers that would
>>> otherwise not be visible.
>>>
>>> Hope this clarifies it.
>>
>> Well, How do people expect to know if stuff change without a
>> persistent
>> discovery controller? Not sure if people may see more discovery
>> controllers this is a real issue given what they are getting from it.
> 
> You're right. At this very early stage, I wanted to give users the
> freedom of choice, and not force dozens of discovery controller
> connections upon everyone. If there's consensus that the discovery
> connections should always be created, fine with me.
> 
> 
I would make the monitor code to always create persistent discovery 
connections.
(Or, at least, _attempt_ to create persistent discovery connections.)

_if_ the user starts the monitor he will have a reason for it; and that 
reason typically is that he _wants_ the system to react on discovery 
changes.
And that is precisely why persistent discovery controllers exist.

Plus there is a provision in the spec allowing the controller to reject 
persistent connections, so if the controller is worried about increased 
resource consumption he can just disallow persistent controller connections.

Similarly, I would keep the created discovery connections after exit.
Currently the simple reason is that we don't store the information from 
the created discovery connections, so if we tear them down on exit we 
won't be able to recreate them once we restart the monitor.

This situation might change with once we have mDNS integration; and, of 
course, for FC figuring out this information is trivial.

>>> I agree. But it's not easy to fix the issue otherwise. In the
>>> customer
>>> problem where we observed it, I worked around it by adding the udev
>>> seqnum to the "instance name" of the systemd service, thus allowing
>>> several "nvme connect-all" processes to run for the same transport
>>> address simultaneously. But I don't think that would scale well;
>>> the monitor can handle it more cleanly.
>>
>> Still changing the entire thing for a corner case...
> 
> It's just one puzzle piece.
> 
And it's actually not a corner case, but a real issue we're seeing in 
our (or rather, our customers) deployments.
AENs are being received at the same time, causing 'nvme discover' to run 
in parallel with the same information.
A very nice exercise in finding all race conditions in the code.
And completely pointless to boot.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer



More information about the Linux-nvme mailing list