[PATCH] nvme-tcp: align I/O cpu with blk-mq mapping

Wed Jun 19 03:01:08 PDT 2024

On 6/19/24 10:57, Sagi Grimberg wrote:
> 
> 
> On 19/06/2024 8:30, Christoph Hellwig wrote:
>> On Tue, Jun 18, 2024 at 02:03:45PM +0200, Hannes Reinecke wrote:
>>> Add a new module parameter 'wq_affinity' to spread the I/O
>>> over all cpus within the blk-mq hctx mapping for the queue.
>>> This avoids bouncing I/O between cpus when we have less
>>> hardware queues than cpus.
>> What is the benefit when setting it?  What is the downside?  Why do you
>> think it needs to be conditional?
> 
> Not entirely sure.
> 

Hehe. But I am :-)

> Hannes,
> 
> Can you please show what is the hctx<->cpu mappings before/after this
> patch? Please use different queue counts so we can see the pattern.
> 
> Can you please show any performance comparisons for low and high
> queue counts?
> 
> I'm not entirely clear what exactly "spreads I/O over all CPUs" mean.
> Every nvme-tcp queue has an io context that does network sends and receives
> (unless it goes within .queue_rq() context when possible). Does this 
> somehow mean
> that now a queue io context will run from multiple cpus at the same time?

We are having on connection per queue, doing a 'queue_work_on()' for 
each queue, based on the value of 'io_cpu'.
That cpu should align to the set of CPUs per associated hwctx, otherwise
we need to bounce the thread from one cpu to another.
Case in point on this AMD machine I get this blk-mq mapping:

queue 0: 6, 7, 8, 54, 55, 56
queue 1: 9, 10, 11, 57, 58, 59
queue 10: 42, 43, 44, 90, 91, 92
queue 11: 45, 46, 47, 93, 94, 95
queue 12: 12, 13, 14, 60, 61, 62
queue 13: 15, 16, 17, 63, 64, 65
queue 14: 0, 1, 2, 48, 49, 50
queue 15: 3, 4, 5, 51, 52, 53
queue 2: 18, 19, 20, 66, 67, 68
queue 3: 21, 22, 23, 69, 70, 71
queue 4: 24, 25, 26, 72, 73, 74
queue 5: 27, 28, 29, 75, 76, 77
queue 6: 30, 31, 32, 78, 79, 80
queue 7: 33, 34, 35, 81, 82, 83
queue 8: 36, 37, 38, 84, 85, 86
queue 9: 39, 40, 41, 87, 88, 89

- default behaviour:

nvme nvme1: mapped 16/0/0 default/read/poll queues.
nvme nvme1: queue 1: using cpu 0
nvme nvme1: queue 2: using cpu 1
nvme nvme1: queue 3: using cpu 2
nvme nvme1: queue 4: using cpu 3
nvme nvme1: queue 5: using cpu 4
nvme nvme1: queue 6: using cpu 5
nvme nvme1: queue 7: using cpu 6
nvme nvme1: queue 8: using cpu 7
nvme nvme1: queue 9: using cpu 8
nvme nvme1: queue 10: using cpu 9
nvme nvme1: queue 11: using cpu 10
nvme nvme1: queue 12: using cpu 11
nvme nvme1: queue 13: using cpu 12
nvme nvme1: queue 14: using cpu 13
nvme nvme1: queue 15: using cpu 14
nvme nvme1: queue 16: using cpu 15

So just queue 12 is using the correct CPU; all other queues are
scheduled on cpus which are not part of the hwctx map.

- wq_affinity behaviour:

nvme nvme1: queue 1: using cpu 6
nvme nvme1: queue 2: using cpu 9
nvme nvme1: queue 3: using cpu 18
nvme nvme1: queue 4: using cpu 21
nvme nvme1: queue 5: using cpu 24
nvme nvme1: queue 6: using cpu 27
nvme nvme1: queue 7: using cpu 30
nvme nvme1: queue 8: using cpu 33
nvme nvme1: queue 9: using cpu 36
nvme nvme1: queue 10: using cpu 39
nvme nvme1: queue 11: using cpu 42
nvme nvme1: queue 12: using cpu 45
nvme nvme1: queue 13: using cpu 12
nvme nvme1: queue 14: using cpu 15
nvme nvme1: queue 15: using cpu 0
nvme nvme1: queue 16: using cpu 3

All queues are scheduled on a cpu which is part of the
blk-mq hwctx map.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich