[PATCH v24 01/20] net: Introduce direct data placement tcp offload
Sagi Grimberg
sagi at grimberg.me
Fri May 3 00:31:50 PDT 2024
On 5/2/24 10:04, Aurelien Aptel wrote:
> Sagi Grimberg <sagi at grimberg.me> writes:
>> Well, you cannot rely on the fact that the application will be pinned to a
>> specific cpu core. That may be the case by accident, but you must not and
>> cannot assume it.
> Just to be clear, any CPU can read from the socket and benefit from the
> offload but there will be an extra cost if the queue CPU is different
> from the offload CPU. We use cfg->io_cpu as a hint.
Understood. It is usually the case as io threads are not aligned to the
rss steering rules (unless
arfs is used).
>
>> Even today, nvme-tcp has an option to run from an unbound wq context,
>> where queue->io_cpu is set to WORK_CPU_UNBOUND. What are you going to
>> do there?
> When the CPU is not bound to a specific core, we will most likely always
> have CPU misalignment and the extra cost that goes with it.
Yes, as done today.
>
> But when it is bound, which is still the default common case, we will
> benefit from the alignment. To not lose that benefit for the default
> most common case, we would like to keep cfg->io_cpu.
Well, this explanation is much more reasonable. Setting .affinity_hint
argument
seems like a proper argument to the interface and nvme-tcp can set it to
queue->io_cpu.
>
> Could you clarify what are the advantages of running unbounded queues,
> or to handle RX on a different cpu than the current io_cpu?
See the discussion related to the patch from Li Feng:
https://lore.kernel.org/lkml/20230413062339.2454616-1-fengli@smartx.com/
>
>> nvme-tcp may handle rx side directly from .data_ready() in the future, what
>> will the offload do in that case?
> It is not clear to us what the benefit of handling rx in .data_ready()
> will achieve. From our experiment, ->sk_data_ready() is called either
> from queue->io_cpu, or sk->sk_incoming_cpu. Unless you enable aRFS,
> sk_incoming_cpu will be constant for the whole connection. Can you
> clarify would handling RX from data_ready() provide?
Save the context switching to a kthread from softirq, can reduce latency
substantially
for some workloads.
More information about the Linux-nvme
mailing list