[PATCH] nvme-multipath: add an 'ana_groups_only' module option
Hannes Reinecke
hare at suse.de
Thu Feb 10 00:17:44 PST 2022
On 2/10/22 03:52, John Meneghini wrote:
> On 2/9/22 03:07, Christoph Hellwig wrote:
>> On Mon, Feb 07, 2022 at 11:00:05AM +0100, Hannes Reinecke wrote:
>>> On large installations the ANA log buffer can be exceedingly large;
>>> we've come across a controller with 49 ANA Group Descriptors and
>>> 65536 namespaces, resulting in an ANA buffer with an order-7 allocation.
>>> And this is just to validate that the namespace ID is _really_listed
>>> in the log page.
>>> So to avoid an overly large memory allocation we can leverage the
>>> 'RGO' bit when retrieving the ANA log page, and check whether the
>>> ANA group ID from the namespace is found in the ANA descriptors.
>>> That cuts down the memory allocation, and provides the same result.
>>> But to be on the safe side I've added a module option 'ana_groups_only'
>>> to switch between modes.
>>
>> How is this supposed to work? We'll fail to see what namespaces
>> the change applies to.
>>
>> So in doubt fix the controller config to be less broken (and say hello
>> to NetApp and explain them they do not need more namespace for more
>> performance), and if that fails switch to a vmalloc allocation for
>> the buffer.
>
> I agree with Christoph. I don't see the point in supporting 65536
> namespaces across 49 ana groups or controllers. The problem here is: the
> vendor is trying to turn NVMe into SCSI.
>
> Moreover, I don't understand how implementing this as a MODULE_PARM is
> supposed to work. If you configure this module parameter on it assumes
> all NVMe-oF arrays connected to the host support RGO. What's really
> needed here is some kind of protocol mechanism that will allow the host
> to dynamically discovery if RGO is supported on a controller by
> controller basis.
>
> And this isn't an NetApp array. Just look at who's asking for this
> change if you want a clue as to what NVMe-oF array is asking for this.
> I know for a fact this isn't a NetApp array.
>
Indeed, you are correct. Surprisingly we have more customers/partners
implementing NVMe :-)
All things considered I guess I'll have to go with the kvmalloc approach.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
More information about the Linux-nvme
mailing list