[RFC PATCH 0/6] nvme multipath eBPF path selector
Geliang Tang
geliang at kernel.org
Tue Jul 29 19:03:31 PDT 2025
Hi Hannes,
On Tue, 2025-07-29 at 09:06 +0200, hare at kernel.org wrote:
> From: Hannes Reinecke <hare at kernel.org>
>
> Hi all,
>
> there are discussion on having to deploy more complex I/O scheduling
> algorithms for NVMe, but then there's the question whether we really
> want to carry these in the kernel.
> Which sounded like an ideal testbed for eBPF struct_ops to me.
> Taking a cue from Ming Lei's patchset for eBPF on ublk (thanks,
> Ming!)
> I've started messing around with eBPF.
I happen to have experience in this area and would like to participate
in the development of nvme-bpf. I have previously developed the MPTCP
BPF packet scheduler [1], which is already in the export branch of the
MPTCP repository [2].
[1]
https://github.com/multipath-tcp/mptcp_net-next/issues/75
[2]
https://github.com/multipath-tcp/mptcp_net-next/commit/e83320eb669f48effae8a2d203d834ca2454308a
https://github.com/multipath-tcp/mptcp_net-next/commit/397b7213a2e45bc0c188d5fefa0889899657716f
https://github.com/multipath-tcp/mptcp_net-next/commit/0c59c5d43f6babd016bdbbf00365257ea57796e9
>
> So here's a patchset to implement nvme multipath eBPF path selectors.
> Idea's quite simple: the eBPF 'struct_ops' program is providing a
> 'select_path' function, which selects a nvme_ns struct to use for
> the I/O starting at a given sector.
> Unfortunately ePBF doesn't allow to pass pointers, _and_ the
> definitions
> for 'struct nvme_ns_head' and 'struct nvme_ns' are internal to the
> nvme subsystem. So I kept those structures as opaque pointers for
> ePBF, and introduced a 'nvme_bpf_iter' structure as a path iterator.
> There are two functions 'nvme_bpf_first_path' and
> 'nvme_bpf_next_path'
> which can be used for an open-coded loop over all paths.
> I've also added sample code as an example how the loop can be coded.
>
> It's all pretty rudimentary (as I'm sure people will need accessors
> to get to any namespace or controller details), but that's why I sent
> it out as an RFC. And I am by no means an eBPF expert, so I'd be
> glad for any corrections or suggestions for a better eBPF
> integration.
>
> The entire patchset can be found at:
> git.kernel.org:/pub/scm/linux/kernel/git/hare/scsi-devel.git
> branch nvme-bpf
>
> As usual, reviews and comments are welcome.
>
> Hannes Reinecke (6):
> nvme-multipath: do not assign ->current_path in __nvme_find_path()
> nvme: export nvme_find_get_subsystem()/nvme_put_subsystem()
> nvme: add per-namespace iopolicy sysfs attribute
> nvme: add 'sector' parameter to nvme_find_path()
> nvme-bpf: eBPF struct_ops path selectors
> tools/testing/selftests: add sample nvme bpf path selector
>
> drivers/nvme/host/Kconfig | 9 +
> drivers/nvme/host/Makefile | 1 +
> drivers/nvme/host/bpf.h | 33 ++
> drivers/nvme/host/bpf_ops.c | 347
> ++++++++++++++++++
> drivers/nvme/host/core.c | 17 +-
> drivers/nvme/host/ioctl.c | 7 +-
> drivers/nvme/host/multipath.c | 69 +++-
> drivers/nvme/host/nvme.h | 11 +-
> drivers/nvme/host/pr.c | 2 +-
> drivers/nvme/host/sysfs.c | 9 +-
> include/linux/nvme-bpf.h | 54 +++
> .../selftests/bpf/progs/bpf_nvme_simple.c | 52 +++
> 12 files changed, 585 insertions(+), 26 deletions(-)
> create mode 100644 drivers/nvme/host/bpf.h
> create mode 100644 drivers/nvme/host/bpf_ops.c
> create mode 100644 include/linux/nvme-bpf.h
> create mode 100644
> tools/testing/selftests/bpf/progs/bpf_nvme_simple.c
More information about the Linux-nvme
mailing list