[PATCH] blk-mq-rdma: remove queue mapping helper for rdma devices

Sagi Grimberg sagi at grimberg.me
Sun Mar 26 00:12:33 PDT 2023


>>>>>>>> No rdma device exposes its irq vectors affinity today. So the only
>>>>>>>> mapping that we have left, is the default blk_mq_map_queues, which
>>>>>>>> we fallback to anyways. Also fixup the only consumer of this helper
>>>>>>>> (nvme-rdma).
>>>>>>>
>>>>>>> This was the only caller of ib_get_vector_affinity() so please delete
>>>>>>> op get_vector_affinity and ib_get_vector_affinity() from verbs as well
>>>>>>
>>>>>> Yep, no problem.
>>>>>>
>>>>>> Given that nvme-rdma was the only consumer, do you prefer this goes from
>>>>>> the nvme tree?
>>>>>
>>>>> Sure, it is probably fine
>>>>
>>>> I tried to do it two+ years ago:
>>>> https://lore.kernel.org/all/20200929091358.421086-1-leon@kernel.org
>>>
>>> Christoph's points make sense, but I think we should still purge this
>>> code.
>>>
>>> If we want to do proper managed affinity the right RDMA API is to
>>> directly ask for the desired CPU binding when creating the CQ, and
>>> optionally a way to change the CPU binding of the CQ at runtime.
>>
>> I think the affinity management is referring to IRQD_AFFINITY_MANAGED
>> which IIRC is the case when the device passes `struct irq_affinity` to
>> pci_alloc_irq_vectors_affinity.
>>
>> Not sure what that has to do with passing a cpu to create_cq.
> 
> I took Christoph's remarks to be that the system should auto configure
> interrupts sensibly and not rely on userspace messing around in proc.

Yes, that is correct.

> For instance, I would expect that the NVMe driver work the same way on
> RDMA and PCI. For PCI it calls pci_alloc_irq_vectors_affinity(), RDMA
> should call some ib_alloc_cq_affinity() and generate the affinity in
> exactly the same way.

But an RDMA ulp does not own the EQs like the nvme driver does.
That is why NVMe is fine with managed affinity, and RDMA is not.
The initial attempt was to make RDMA use managed affinity, but then
users started complaining that they are unable to mangle with irq
vector affinity via procfs.

> So, I have no problem to delete these things as the
> get_vector_affinity API is not part of solving the affinity problem,
> and it seems NVMe PCI doesn't need blk_mq_rdma_map_queues() either.

Cool.



More information about the Linux-nvme mailing list