[PATCH 2/4] nvme-tcp: align I/O cpu with blk-mq mapping

Sagi Grimberg sagi at grimberg.me
Wed Jul 3 08:03:34 PDT 2024



On 03/07/2024 17:53, Hannes Reinecke wrote:
> On 7/3/24 16:19, Sagi Grimberg wrote:
>>
>>
>> On 03/07/2024 16:50, Hannes Reinecke wrote:
>>> When 'wq_unbound' is selected we should select the
>>> the first CPU from a given blk-mq hctx mapping to queue
>>> the tcp workqueue item. With this we can instruct the
>>> workqueue code to keep the I/O affinity and avoid
>>> a performance penalty.
>>
>> wq_unbound is designed to keep io_cpu to be UNBOUND, my recollection
>> was the the person introducing it was trying to make the io_cpu 
>> always be
>> on a specific NUMA node, or a subset of cpus within a numa node. So 
>> he uses
>> that and tinkers with wq cpumask via sysfs.
>>
>> I don't see why you are tying this to wq_unbound in the first place.
>>
> Because in the default case the workqueue is nailed to a cpu, and will 
> not move from it. IE if you call 'queue_work_on()' it _will_ run on 
> that cpu.
> But if something else is running on that CPU (printk logging, say), 
> you will have to stand in the queue until the scheduler gives you some 
> time.
>
> If the workqueue is unbound the workqueue code is able to switch away 
> from the cpu if it finds it busy or otherwise unsuitable, leading to a 
> better utilization and avoiding a workqueue stall.
> And in the 'unbound' case the 'cpu' argument merely serves as a hint
> where to place the workqueue item.
> At least, that's how I understood the code.

We should make the io_cpu come from blk-mq hctx mapping by default, and 
for every controller it should use a different cpu from the hctx 
mapping. That is the default behavior. in the wq_unbound case, we skip 
all of that and make io_cpu = WORK_CPU_UNBOUND, as it was before.

I'm not sure I follow your logic.

>
> And it makes the 'CPU hogged' messages go away, which is a bonus in 
> itself...

Which messages? aren't these messages saying that the work spent too 
much time? why are you describing
the case where the work does not get cpu quota to run?



More information about the Linux-nvme mailing list