[PATCHv2] nvme-tcp: align I/O cpu with blk-mq mapping
Hannes Reinecke
hare at suse.de
Wed Jun 19 08:49:20 PDT 2024
On 6/19/24 17:43, Sagi Grimberg wrote:
>
>
> On 19/06/2024 18:23, Hannes Reinecke wrote:
>> On 6/19/24 16:59, Christoph Hellwig wrote:
>>>> if (wq_unbound)
>>>> queue->io_cpu = WORK_CPU_UNBOUND;
>>>
>>> None of the above computed information is even used for the wq_unbound
>>> code. This would make a lot more sense if the above assignment was in
>>> the (only) caller.
>>>
>>> Yes, that probably should have been done when merging the wq_unbound
>>> option (which honestly looks so whacky that I wish it wasn't merged).
>>>
>> Ah, you noticed?
>>
>> This patch is actually got sparked off by one of our partners notifying
>> a severe latency increase proportional with the number of controllers.
>> With a marked increase when (sw) TLS is active; they even managed to run
>> into command timeouts.
>>
>> What happens is that we shove all work from identical queue numbers
>> onto the same CPU; so if your controller only exports 4 queues _all_
>> work from qid 1 (from all controllers!) is pushed onto CPU 0 causing a
>> massive oversubscription on that CPU and leaving all other CPUs in the
>> system idle.
>
> I see how you address multiple controllers falling into the same
> mappings case in your patch.
> You could have selected a different mq_map entry for each controller
> (out of the entries that map to the qid).
>
Looked at it, but hadn't any idea how to figure out the load.
The load is actually per-cpu, but we only have per controller structures.
So we would need to introduce a per-cpu counter, detailing out the
number of queues scheduled on that CPU.
But that won't help with the CPU oversubscription issue; we still might
have substantially higher number of overall queues than we have CPUs...
>>
>> Not sure how wq_unbound helps in this case; in theory the workqueue
>> items can be pushed on arbitrary CPUs, but that only leads to even worse
>> thread bouncing.
>>
>> However, topic for ALPSS. We really should have some sore of
>> backpressure here.
>
> I have a patch that was sitting for some time now, to make the RX path
> run directly
> from softirq, which should make RX execute from the cpu core mapped to
> the RSS hash.
> Perhaps you or your customer can give it a go.
>
No s**t. That is pretty much what I wanted to do.
I'm sure to give it a go.
Thanks for that!
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich
More information about the Linux-nvme
mailing list