[PATCHv2] nvme-tcp: Implement recvmsg() receive flow
Sagi Grimberg
sagi at grimberg.me
Sun Nov 30 13:35:32 PST 2025
On 27/11/2025 9:52, Hannes Reinecke wrote:
> On 11/26/25 08:32, Sagi Grimberg wrote:
>>
>>
>> On 20/10/2025 11:58, Hannes Reinecke wrote:
>>> The nvme-tcp code is using the ->read_sock() interface to
>>> read data from the wire. While this interface gives us access
>>> to the skbs themselves (and so might be able to reduce latency)
>>> it does not interpret the skbs.
>>> Additionally for TLS these skbs have to be re-constructed from
>>> the TLS stream data, rendering any advantage questionable.
>>> But the main drawback for TLS is that we do not get access to
>>> the TLS control messages, so if we receive any of those message
>>> the only choice we have is to tear down the connection and restart.
>>> This patch switches the receive side over to use recvmsg(), which
>>> provides us full access to the TLS control messages and is also
>>> more efficient when working with TLS as skbs do not need to be
>>> artificially constructed.
>>
>> Hannes,
>>
>> I generally agree with this approach. I'd like to point out though
>> that this is going to give up running RX from directly from softirq
>> context.
>
> Yes.
>
>> I've gone back and forth on weather nvme-tcp should do that, but never
>> got to do a thorough comparison between the two. This probably shuts
>> the door on that option.
>>
> The thing with running from softirq context is that it would only
> make sense if we could _ensure_ that the softirq context is running
> on the cpu where the blk-mq hardware context is expecting it to.
What is this statement based on? softirq runs where the NIC interrupt
happens, which eliminates the context switch to the workqueue io_cpu
which is not guaranteed to affinitize where userspace is, in fact it often
isn't in nvme-tcp...
> Not only would that require fiddling with RFS contexts, but we also
> found that NVMe-over-fabrics should _not_ try to align with hardware
> interrupts but rather rely on the driver to abstract things away.
I did not expect anyone to fiddle with RFS for softirq context. The main
benefit of softirq context RX (outside of latency reduction) is that it
makes io_work handle ONLY TX, which is probably somewhat more efficient.
More information about the Linux-nvme
mailing list