[PATCH v24 01/20] net: Introduce direct data placement tcp offload

Aurelien Aptel aaptel at nvidia.com
Mon May 6 05:28:11 PDT 2024


Sagi Grimberg <sagi at grimberg.me> writes:
> Understood. It is usually the case as io threads are not aligned to the
> rss steering rules (unless
> arfs is used).

The steering rule we add has 2 effects:
1) steer the flow to the offload object on the HW
2) provide CPU alignement with the io threads (Rx affinity similar to
   aRFS)

We understand point (2) might not be achieved with unbounded queue
scenarios. That's fine.

>> But when it is bound, which is still the default common case, we will
>> benefit from the alignment. To not lose that benefit for the default
>> most common case, we would like to keep cfg->io_cpu.
>
> Well, this explanation is much more reasonable. Setting .affinity_hint
> argument
> seems like a proper argument to the interface and nvme-tcp can set it to
> queue->io_cpu.

Ok, we will rename cfg->io_cpu to ->affinity_hint.

>> Could you clarify what are the advantages of running unbounded queues,
>> or to handle RX on a different cpu than the current io_cpu?
>
> See the discussion related to the patch from Li Feng:
> https://lore.kernel.org/lkml/20230413062339.2454616-1-fengli@smartx.com/

Thanks. As said previously, we are fine with decreased perfs in this
edge case.

>>> nvme-tcp may handle rx side directly from .data_ready() in the future, what
>>> will the offload do in that case?
>> It is not clear to us what the benefit of handling rx in .data_ready()
>> will achieve. From our experiment, ->sk_data_ready() is called either
>> from queue->io_cpu, or sk->sk_incoming_cpu. Unless you enable aRFS,
>> sk_incoming_cpu will be constant for the whole connection. Can you
>> clarify would handling RX from data_ready() provide?
>
> Save the context switching to a kthread from softirq, can reduce latency
> substantially
> for some workloads.

Ok, thanks for the explanation. With bounded queues and steering the
softirq CPU will be aligned with the offload CPU, so we think this is
also OK.

Thanks



More information about the Linux-nvme mailing list