[PATCHv2] nvme-tcp: align I/O cpu with blk-mq mapping

Wed Jun 19 08:23:57 PDT 2024

On 6/19/24 16:59, Christoph Hellwig wrote:
>>   	if (wq_unbound)
>>   		queue->io_cpu = WORK_CPU_UNBOUND;
> 
> None of the above computed information is even used for the wq_unbound
> code.  This would make a lot more sense if the above assignment was in
> the (only) caller.
> 
> Yes, that probably should have been done when merging the wq_unbound
> option (which honestly looks so whacky that I wish it wasn't merged).
> 
Ah, you noticed?

This patch is actually got sparked off by one of our partners notifying
a severe latency increase proportional with the number of controllers.
With a marked increase when (sw) TLS is active; they even managed to run
into command timeouts.

What happens is that we shove all work from identical queue numbers onto 
the same CPU; so if your controller only exports 4 queues _all_ work 
from qid 1 (from all controllers!) is pushed onto CPU 0 causing a 
massive oversubscription on that CPU and leaving all other CPUs in the 
system idle.

Not sure how wq_unbound helps in this case; in theory the workqueue
items can be pushed on arbitrary CPUs, but that only leads to even worse
thread bouncing.

However, topic for ALPSS. We really should have some sore of 
backpressure here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich