[PATCH 0/7] RFC patch series - NVMeTCP Offload ULP
Sagi Grimberg
sagi at grimberg.me
Wed Nov 25 20:19:35 EST 2020
>>> This patch series introduces the nvme-tcp-offload ULP host layer, which will
>>> be a new transport type called "tcp-offload" and will serve as an abstraction
>>> layer to work with vendor specific nvme-tcp offload drivers.
>>>
>>> The nvme-tcp-offload transport can co-exist with the existing tcp and other
>>> transports. The tcp offload was designed so that stack changes are kept to a
>>> bare minimum: only registering new transports. All other APIs, ops etc. are
>>> identical to the regular tcp transport.
>>> Representing the TCP offload as a new transport allows clear and
>>> manageable differentiation between the connections which should use the
>>> offload path and those that are not offloaded (even on the same device).
>>
>> why can't we extend the current NVMe-TCP driver and register vendor ops
>> to it, instead of duplicating the entire driver ?
>>
>> AFAIU, only the IO path logic is vendor specific but all the rest is the same.
>>
>
> The reasons it's separated into a new transport (and a new module):
> 1. The offload we are adding is an offload of the entire tcp layer and most of
> the nvme-tcp layer above it. When this offload is active, there are no tcp
> connections in the stack, and no tcp sockets (just like the existing in kernel
> tcp offload models for iWARP and iSCSI). From the nvme stack perspective it
> really is a separate transport.
> 2. The offload is not just IO path, but also the entire control plane, including
> tcp retransmissions, connection establishment and connection error handling.
> 3. Keeping the transports separate would allow each of them to evolve without
> worrying about breaking each other.
> 4. If significant code duplication exists anywhere, we can solve it with
> tcp_common functions between the transports.
>
Hey, sorry it took me a while to get to this...
I'll try to give my PoV. past experience with the suggestion
of layering a TCP storage driver to support both offload
and SW (e.g. iscsi) resulted in a high degree of difficulty
to change things without breaking something (on both ends, both
sw and offload). This goes for both data plane and also the control
plane.
I think that layering would result in indirect interfaces that
can result in a higher degree of duplication. Hence I think that
this proposed approach would be a better way to go (yet to be proven
though).
More information about the Linux-nvme
mailing list