[PATCH v4 00/20] NVMeTCP Offload ULP
Prabhakar Kushwaha
pkushwaha at marvell.com
Tue Jun 29 05:47:23 PDT 2021
With the goal of enabling a generic infrastructure that allows NVMe/TCP
offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
patch series introduces the nvme-tcp-offload ULP host layer, which will
be a new transport type called "tcp-offload" and will serve as an
abstraction layer to work with device specific nvme-tcp offload drivers.
NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes
both the TCP level and the NVMeTCP level.
The nvme-tcp-offload transport can co-exist with the existing tcp and
other transports. The tcp offload was designed so that stack changes are
kept to a bare minimum: only registering new transports.
All other APIs, ops etc. are identical to the regular tcp transport.
Representing the TCP offload as a new transport allows clear and manageable
differentiation between the connections which should use the offload path
and those that are not offloaded (even on the same device).
The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:
* NVMe layer: *
[ nvme/nvme-fabrics/blk-mq ]
|
(nvme API and blk-mq API)
|
|
* Transport layer: *
[ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
| | |
(Verbs)
| | |
| (Socket)
| | |
| | (nvme-tcp-offload API)
| | |
| | |
* Transport Driver: *
| | |
[ RDMA driver ]
| |
[ Network driver ]
|
[ NVMeTCP Offload driver ]
Upstream plan:
==============
As discussed in RFCV7, "NVMeTCP Offload ULP and QEDN Device Driver"
contains 3 parts:
https://lore.kernel.org/linux-nvme/20210531225222.16992-1-smalin@marvell.com/
This series contains part1 and part3, intended for linux-nvme:
- Part 1: The nvme-tcp-offload patches
- Part 3: Marvell's Offload device driver(qedn) patches.
It has "compilation dependency" on both Part 1 and Part 2.
Part 2 is already accepted in net-next.git:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=eda1bc65b0dc1b03006e427430ba23746ec44714
Usage:
======
The user will interact with the network-device in order to configure
the ip/vlan - Logically similar to the RDMA model.
The NVMeTCP configuration is populated as part of the
nvme connect command.
Example:
Assign IP to the net-device (from any existing Linux tool):
ip addr add 100.100.0.101/24 dev p1p1
This IP will be used by both net-device and offload-device.
In order to connect from "sw" nvme-tcp through the net-device:
nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn
In order to connect from "offload" nvme-tcp through the offload-device:
nvme connect -t tcp_offload -s 4420 -a 100.100.0.100 -n testnqn
An alternative approach, and as a future enhancement that will not impact this
series will be to modify nvme-cli with a new flag that will determine
if "-t tcp" should be the regular nvme-tcp (which will be the default)
or nvme-tcp-offload.
Exmaple:
nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn -[new flag]
Queue Initialization Design:
============================
The nvme-tcp-offload ULP module shall register with the existing
nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
The nvme-tcp-offload driver shall register to nvme-tcp-offload ULP
with the following ops:
- claim_dev() - in order to resolve the route to the target according to
the paired net_dev.
- create_queue() - in order to create offloaded nvme-tcp queue.
The nvme-tcp-offload ULP module shall manage all the controller level
functionalities, call claim_dev and based on the return values shall call
the relevant module create_queue in order to create the admin queue and
the IO queues.
IO-path Design:
===============
The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload
ULP module shall pass the request (the IO) to the nvme-tcp-offload
driver and later, the nvme-tcp-offload driver returns the request
completion (the IO completion).
No additional handling is needed in between; this design will reduce the
CPU utilization as we will describe below.
The nvme-tcp-offload driver shall register to nvme-tcp-offload ULP
with the following IO-path ops:
- send_req() - in order to pass the request to the handling of the
offload driver that shall pass it to the offload device
specific device.
- poll_queue()
Once the IO completes, the nvme-tcp-offload driver shall call
command.done() that will invoke the nvme-tcp-offload ULP layer to
complete the request.
Teardown and errors:
====================
In case of NVMeTCP queue error the nvme-tcp-offload driver shall
call the nvme_tcp_ofld_report_queue_err.
The nvme-tcp-offload driver shall register to nvme-tcp-offload ULP
with the following teardown ops:
- drain_queue()
- destroy_queue()
The Marvell qedn driver:
========================
The new driver will be added under "drivers/nvme/hw" and will be enabled
by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
As part of the qedn init, the driver will register as a pci device driver
and will work with the Marvell fastlinQ NIC.
As part of the probe, the driver will register to the nvme_tcp_offload
(ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
"qed_*_ops" which are used by the qede, qedr, qedf and qedi device
drivers.
Changes since RFC v1:
=====================
- nvme-tcp-offload: Fix nvme_tcp_ofld_ops return values.
- nvme-tcp-offload: Remove NVMF_TRTYPE_TCP_OFFLOAD.
- nvme-tcp-offload: Add nvme_tcp_ofld_poll() implementation.
- nvme-tcp-offload: Fix nvme_tcp_ofld_queue_rq() to check map_sg() and
send_req() return values.
Changes since RFC v2:
=====================
- nvme-tcp-offload: Fixes in controller and queue level (patches 3-6).
- qedn: Add the Marvell's NVMeTCP HW offload device driver init and probe
(patches 8-11).
Changes since RFC v3:
=====================
- nvme-tcp-offload: Add the full implementation of the nvme-tcp-offload layer
including the new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new
flows (ASYNC and timeout).
- nvme-tcp-offload: Add device maximums: max_hw_sectors, max_segments.
- nvme-tcp-offload: layer design and optimization changes.
- qedn: Add full implementation for the conn level, IO path and error handling.
Changes since RFC v4:
=====================
(Many thanks to Hannes Reinecke for his feedback)
- nvme_tcp_offload: Add num_hw_vectors in order to limit the number of queues.
- nvme_tcp_offload: Add per device private_data.
- nvme_tcp_offload: Fix header digest, data digest and tos initialization.
- qedn: Remove the qedn_global list.
- qedn: Remove the workqueue flow from send_req.
- qedn: Add db_recovery support.
Changes since RFC v5:
=====================
(Many thanks to Sagi Grimberg for his feedback)
- nvme-fabrics: Expose nvmf_check_required_opts() globally (as a new patch).
- nvme_tcp_offload: Remove io-queues BLK_MQ_F_BLOCKING.
- nvme_tcp_offload: Fix the nvme_tcp_ofld_stop_queue (drain_queue) flow.
- nvme_tcp_offload: Fix the nvme_tcp_ofld_free_queue (destroy_queue) flow.
- nvme_tcp_offload: Change rwsem to mutex.
- nvme_tcp_offload: remove redundant fields.
- nvme_tcp_offload: Remove the "new" from setup_ctrl().
- nvme_tcp_offload: Remove the init_req() and commit_rqs() ops.
- nvme_tcp_offload: Minor fixes in nvme_tcp_ofld_create_ctrl() ansd
nvme_tcp_ofld_free_queue().
- nvme_tcp_offload: Patch 8 (timeout and async) was squeashed into
patch 7 (io level).
- qedn: Fix the free_queue flow and the destroy_queue flow.
- qedn: Remove version number.
Changes since RFC v6:
=====================
- No changes in nvme_tcp_offload
- qedn: Remove redundant logic in the io-queues core affinity initialization.
- qedn: Remove qedn_validate_cccid_in_range().
Changes since v1:
=====================
- nvme_tcp_offload: Add support for NVME_OPT_HOST_IFACE.
- nvme_tcp_offload: Kconfig fix (thanks to Petr Mladek).
- nvme_tcp_offload: return code fix (thanks to Dan Carpenter).
Changes since v2:
=====================
- nvme_tcp_offload: Fix overly long lines.
- nvme_tcp_offload: use correct terminology for vendor driver.
- qedn: Added qedn driver as part of series.
Changes since v3:
=====================
- nvme_tcp_offload: Rename nvme_tcp_ofld_map_data() to
nvme_tcp_ofld_set_sg_host_data().
Arie Gershberg (2):
nvme-tcp-offload: Add controller level implementation
nvme-tcp-offload: Add controller level error recovery implementation
Dean Balandin (3):
nvme-tcp-offload: Add device scan implementation
nvme-tcp-offload: Add queue level implementation
nvme-tcp-offload: Add IO level implementation
Nikolay Assa (1):
qedn: Add qedn_claim_dev API support
Prabhakar Kushwaha (7):
nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS
definitions
nvme-fabrics: Expose nvmf_check_required_opts() globally
qedn: Add connection-level slowpath functionality
qedn: Add support of configuring HW filter block
qedn: Add support of Task and SGL
qedn: Add support of NVME ICReq & ICResp
qedn: Add support of ASYNC
Shai Malin (7):
nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
qedn: Add qedn - Marvell's NVMeTCP HW offload device driver
qedn: Add qedn probe
qedn: Add IRQ and fast-path resources initializations
qedn: Add IO level qedn_send_req and fw_cq workqueue
qedn: Add IO level fastpath functionality
qedn: Add Connection and IO level recovery flows
MAINTAINERS | 18 +
drivers/nvme/Kconfig | 1 +
drivers/nvme/Makefile | 1 +
drivers/nvme/host/Kconfig | 15 +
drivers/nvme/host/Makefile | 3 +
drivers/nvme/host/fabrics.c | 12 +-
drivers/nvme/host/fabrics.h | 9 +
drivers/nvme/host/tcp-offload.c | 1346 ++++++++++++++++++++++++++++++
drivers/nvme/host/tcp-offload.h | 207 +++++
drivers/nvme/hw/Kconfig | 9 +
drivers/nvme/hw/Makefile | 3 +
drivers/nvme/hw/qedn/Makefile | 4 +
drivers/nvme/hw/qedn/qedn.h | 402 +++++++++
drivers/nvme/hw/qedn/qedn_conn.c | 1076 ++++++++++++++++++++++++
drivers/nvme/hw/qedn/qedn_main.c | 1109 ++++++++++++++++++++++++
drivers/nvme/hw/qedn/qedn_task.c | 873 +++++++++++++++++++
16 files changed, 5079 insertions(+), 9 deletions(-)
create mode 100644 drivers/nvme/host/tcp-offload.c
create mode 100644 drivers/nvme/host/tcp-offload.h
create mode 100644 drivers/nvme/hw/Kconfig
create mode 100644 drivers/nvme/hw/Makefile
create mode 100644 drivers/nvme/hw/qedn/Makefile
create mode 100644 drivers/nvme/hw/qedn/qedn.h
create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c
create mode 100644 drivers/nvme/hw/qedn/qedn_main.c
create mode 100644 drivers/nvme/hw/qedn/qedn_task.c
--
2.24.1
More information about the Linux-nvme
mailing list