[PATCH v2 0/8] NVMeTCP Offload ULP
Shai Malin
smalin at marvell.com
Sun Jun 20 22:10:38 PDT 2021
With the goal of enabling a generic infrastructure that allows NVMe/TCP
offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
patch series introduces the nvme-tcp-offload ULP host layer, which will
be a new transport type called "tcp-offload" and will serve as an
abstraction layer to work with vendor specific nvme-tcp offload drivers.
NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes
both the TCP level and the NVMeTCP level.
The nvme-tcp-offload transport can co-exist with the existing tcp and
other transports. The tcp offload was designed so that stack changes are
kept to a bare minimum: only registering new transports.
All other APIs, ops etc. are identical to the regular tcp transport.
Representing the TCP offload as a new transport allows clear and manageable
differentiation between the connections which should use the offload path
and those that are not offloaded (even on the same device).
The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:
* NVMe layer: *
[ nvme/nvme-fabrics/blk-mq ]
|
(nvme API and blk-mq API)
|
|
* Vendor agnostic transport layer: *
[ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
| | |
(Verbs)
| | |
| (Socket)
| | |
| | (nvme-tcp-offload API)
| | |
| | |
* Vendor Specific Driver: *
| | |
[ RDMA driver ]
| |
[ Network driver ]
|
[ NVMeTCP Offload driver ]
Usage:
======
The user will interact with the network-device in order to configure
the ip/vlan - Logically similar to the RDMA model.
The NVMeTCP configuration is populated as part of the
nvme connect command.
Example:
Assign IP to the net-device (from any existing Linux tool):
ip addr add 100.100.0.101/24 dev p1p1
This IP will be used by both net-device and offload-device.
In order to connect from "sw" nvme-tcp through the net-device:
nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn
In order to connect from "offload" nvme-tcp through the offload-device:
nvme connect -t tcp_offload -s 4420 -a 100.100.0.100 -n testnqn
An alternative approach, and as a future enhancement that will not impact this
series will be to modify nvme-cli with a new flag that will determine
if "-t tcp" should be the regular nvme-tcp (which will be the default)
or nvme-tcp-offload.
Exmaple:
nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn -[new flag]
Queue Initialization Design:
============================
The nvme-tcp-offload ULP module shall register with the existing
nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following ops:
- claim_dev() - in order to resolve the route to the target according to
the paired net_dev.
- create_queue() - in order to create offloaded nvme-tcp queue.
The nvme-tcp-offload ULP module shall manage all the controller level
functionalities, call claim_dev and based on the return values shall call
the relevant module create_queue in order to create the admin queue and
the IO queues.
IO-path Design:
===============
The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload
ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor
driver and later, the nvme-tcp-offload vendor driver returns the request
completion (the IO completion).
No additional handling is needed in between; this design will reduce the
CPU utilization as we will describe below.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following IO-path ops:
- send_req() - in order to pass the request to the handling of the
offload driver that shall pass it to the vendor specific device.
- poll_queue()
Once the IO completes, the nvme-tcp-offload vendor driver shall call
command.done() that will invoke the nvme-tcp-offload ULP layer to
complete the request.
Teardown and errors:
====================
In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall
call the nvme_tcp_ofld_report_queue_err.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following teardown ops:
- drain_queue()
- destroy_queue()
Changes since RFC v1:
=====================
- nvme-tcp-offload: Fix nvme_tcp_ofld_ops return values.
- nvme-tcp-offload: Remove NVMF_TRTYPE_TCP_OFFLOAD.
- nvme-tcp-offload: Add nvme_tcp_ofld_poll() implementation.
- nvme-tcp-offload: Fix nvme_tcp_ofld_queue_rq() to check map_sg() and
send_req() return values.
Changes since RFC v2:
=====================
- nvme-tcp-offload: Fixes in controller and queue level (patches 3-6).
- qedn: Add the Marvell's NVMeTCP HW offload vendor driver init and probe
(patches 8-11).
Changes since RFC v3:
=====================
- nvme-tcp-offload: Add the full implementation of the nvme-tcp-offload layer
including the new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new
flows (ASYNC and timeout).
- nvme-tcp-offload: Add device maximums: max_hw_sectors, max_segments.
- nvme-tcp-offload: layer design and optimization changes.
Changes since RFC v4:
=====================
(Many thanks to Hannes Reinecke for his feedback)
- nvme_tcp_offload: Add num_hw_vectors in order to limit the number of queues.
- nvme_tcp_offload: Add per device private_data.
- nvme_tcp_offload: Fix header digest, data digest and tos initialization.
Changes since RFC v5:
=====================
(Many thanks to Sagi Grimberg for his feedback)
- nvme-fabrics: Expose nvmf_check_required_opts() globally (as a new patch).
- nvme_tcp_offload: Remove io-queues BLK_MQ_F_BLOCKING.
- nvme_tcp_offload: Fix the nvme_tcp_ofld_stop_queue (drain_queue) flow.
- nvme_tcp_offload: Fix the nvme_tcp_ofld_free_queue (destroy_queue) flow.
- nvme_tcp_offload: Change rwsem to mutex.
- nvme_tcp_offload: remove redundant fields.
- nvme_tcp_offload: Remove the "new" from setup_ctrl().
- nvme_tcp_offload: Remove the init_req() and commit_rqs() ops.
- nvme_tcp_offload: Minor fixes in nvme_tcp_ofld_create_ctrl() ansd
nvme_tcp_ofld_free_queue().
- nvme_tcp_offload: Patch 8 (timeout and async) was squeashed into
patch 7 (io level).
Changes since RFC v6:
=====================
- No changes in nvme_tcp_offload (only in qedn).
Changes since v0:
=====================
- nvme_tcp_offload: Add support for NVME_OPT_HOST_IFACE.
- nvme_tcp_offload: Kconfig fix (thanks to Petr Mladek).
- nvme_tcp_offload: return code fix (thanks to Dan Carpenter).
Arie Gershberg (2):
nvme-tcp-offload: Add controller level implementation
nvme-tcp-offload: Add controller level error recovery implementation
Dean Balandin (3):
nvme-tcp-offload: Add device scan implementation
nvme-tcp-offload: Add queue level implementation
nvme-tcp-offload: Add IO level implementation
Prabhakar Kushwaha (2):
nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS
definitions
nvme-fabrics: Expose nvmf_check_required_opts() globally
Shai Malin (1):
nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
MAINTAINERS | 8 +
drivers/nvme/host/Kconfig | 15 +
drivers/nvme/host/Makefile | 3 +
drivers/nvme/host/fabrics.c | 12 +-
drivers/nvme/host/fabrics.h | 9 +
drivers/nvme/host/tcp-offload.c | 1333 +++++++++++++++++++++++++++++++
drivers/nvme/host/tcp-offload.h | 207 +++++
7 files changed, 1578 insertions(+), 9 deletions(-)
create mode 100644 drivers/nvme/host/tcp-offload.c
create mode 100644 drivers/nvme/host/tcp-offload.h
--
2.22.0
More information about the Linux-nvme
mailing list