[PATCH v11 00/25] nvme-tcp receive offloads

Sagi Grimberg sagi at grimberg.me
Thu Feb 23 07:33:40 PST 2023


> Hi,
> 
> Here is the next iteration of our nvme-tcp receive offload series.
> 
> The main changes are in patch 3 (netlink).
> 
> Rebased on top of today net-next
> 8065c0e13f98 ("Merge branch 'yt8531-support'")
> 
> The changes are also available through git:
> 
> Repo: https://github.com/aaptel/linux.git branch nvme-rx-offload-v11
> Web: https://github.com/aaptel/linux/tree/nvme-rx-offload-v11
> 
> The NVMeTCP offload was presented in netdev 0x16 (video now available):
> - https://netdevconf.info/0x16/session.html?NVMeTCP-Offload-%E2%80%93-Implementation-and-Performance-Gains
> - https://youtu.be/W74TR-SNgi4
> 
> From: Aurelien Aptel <aaptel at nvidia.com>
> From: Shai Malin <smalin at nvidia.com>
> From: Ben Ben-Ishay <benishay at nvidia.com>
> From: Boris Pismenny <borisp at nvidia.com>
> From: Or Gerlitz <ogerlitz at nvidia.com>
> From: Yoray Zack <yorayz at nvidia.com>

Hey Aurelien and Co,

I've spent some time today looking at the last iteration of this,
What I cannot understand, is how will this ever be used outside
of the kernel nvme-tcp host driver?

It seems that the interface is diesigned to fit only a kernel
consumer, and a very specific one.

Have you considered using a more standard interfaces to use this
such that spdk or an io_uring based initiator can use it?

To me it appears that:
- ddp limits can be obtained via getsockopt
- sk_add/sk_del can be done via setsockopt
- offloaded DDGST crc can be obtained via something like
   msghdr.msg_control
- Perhaps for setting up the offload per IO, recvmsg would be the
   vehicle with a new msg flag MSG_RCV_DDP or something, that would hide
   all the details of what the HW needs (the command_id would be set
   somewhere in the msghdr).
- And all of the resync flow would be something that a separate
   ulp socket provider would take care of. Similar to how TLS presents
   itself to a tcp application. So the application does not need to be
   aware of it.

I'm not sure that such interface could cover everything that is needed, 
but what I'm trying to convey, is that the current interface limits the
usability for almost anything else. Please correct me if I'm wrong.
Is this designed to also cater anything else outside of the kernel
nvme-tcp host driver?

> Compatibility
> =============
> * The offload works with bare-metal or SRIOV.
> * The HW can support up to 64K connections per device (assuming no
>    other HW accelerations are used). In this series, we will introduce
>    the support for up to 4k connections, and we have plans to increase it.
> * SW TLS could not work together with the NVMeTCP offload as the HW
>    will need to track the NVMeTCP headers in the TCP stream.

Can't say I like that.

> * The ConnectX HW support HW TLS, but in ConnectX-7 those features
>    could not co-exists (and it is not part of this series).
> * The NVMeTCP offload ConnectX 7 HW can support tunneling, but we
>    don’t see the need for this feature yet.
> * NVMe poll queues are not in the scope of this series.

bonding/teaming?

> 
> Future Work
> ===========
> * NVMeTCP transmit offload.
> * NVMeTCP host offloads incremental features.
> * NVMeTCP target offload.

Which target? which host?



More information about the Linux-nvme mailing list