[RFC PATCH 0/6] nvme multipath eBPF path selector

Geliang Tang geliang at kernel.org
Tue Jul 29 19:03:31 PDT 2025


Hi Hannes,

On Tue, 2025-07-29 at 09:06 +0200, hare at kernel.org wrote:
> From: Hannes Reinecke <hare at kernel.org>
> 
> Hi all,
> 
> there are discussion on having to deploy more complex I/O scheduling
> algorithms for NVMe, but then there's the question whether we really
> want to carry these in the kernel.
> Which sounded like an ideal testbed for eBPF struct_ops to me.
> Taking a cue from Ming Lei's patchset for eBPF on ublk (thanks,
> Ming!)
> I've started messing around with eBPF.

I happen to have experience in this area and would like to participate
in the development of nvme-bpf. I have previously developed the MPTCP
BPF packet scheduler [1], which is already in the export branch of the
MPTCP repository [2].

[1]
https://github.com/multipath-tcp/mptcp_net-next/issues/75

[2]
https://github.com/multipath-tcp/mptcp_net-next/commit/e83320eb669f48effae8a2d203d834ca2454308a
https://github.com/multipath-tcp/mptcp_net-next/commit/397b7213a2e45bc0c188d5fefa0889899657716f
https://github.com/multipath-tcp/mptcp_net-next/commit/0c59c5d43f6babd016bdbbf00365257ea57796e9

> 
> So here's a patchset to implement nvme multipath eBPF path selectors.
> Idea's quite simple: the eBPF 'struct_ops' program is providing a
> 'select_path' function, which selects a nvme_ns struct to use for
> the I/O starting at a given sector.
> Unfortunately ePBF doesn't allow to pass pointers, _and_ the
> definitions
> for 'struct nvme_ns_head' and 'struct nvme_ns' are internal to the
> nvme subsystem. So I kept those structures as opaque pointers for
> ePBF, and introduced a 'nvme_bpf_iter' structure as a path iterator.
> There are two functions 'nvme_bpf_first_path' and
> 'nvme_bpf_next_path'
> which can be used for an open-coded loop over all paths.
> I've also added sample code as an example how the loop can be coded.
> 
> It's all pretty rudimentary (as I'm sure people will need accessors
> to get to any namespace or controller details), but that's why I sent
> it out as an RFC. And I am by no means an eBPF expert, so I'd be
> glad for any corrections or suggestions for a better eBPF
> integration.
> 
> The entire patchset can be found at:
> git.kernel.org:/pub/scm/linux/kernel/git/hare/scsi-devel.git
> branch nvme-bpf
> 
> As usual, reviews and comments are welcome.
> 
> Hannes Reinecke (6):
>   nvme-multipath: do not assign ->current_path in __nvme_find_path()
>   nvme: export nvme_find_get_subsystem()/nvme_put_subsystem()
>   nvme: add per-namespace iopolicy sysfs attribute
>   nvme: add 'sector' parameter to nvme_find_path()
>   nvme-bpf: eBPF struct_ops path selectors
>   tools/testing/selftests: add sample nvme bpf path selector
> 
>  drivers/nvme/host/Kconfig                     |   9 +
>  drivers/nvme/host/Makefile                    |   1 +
>  drivers/nvme/host/bpf.h                       |  33 ++
>  drivers/nvme/host/bpf_ops.c                   | 347
> ++++++++++++++++++
>  drivers/nvme/host/core.c                      |  17 +-
>  drivers/nvme/host/ioctl.c                     |   7 +-
>  drivers/nvme/host/multipath.c                 |  69 +++-
>  drivers/nvme/host/nvme.h                      |  11 +-
>  drivers/nvme/host/pr.c                        |   2 +-
>  drivers/nvme/host/sysfs.c                     |   9 +-
>  include/linux/nvme-bpf.h                      |  54 +++
>  .../selftests/bpf/progs/bpf_nvme_simple.c     |  52 +++
>  12 files changed, 585 insertions(+), 26 deletions(-)
>  create mode 100644 drivers/nvme/host/bpf.h
>  create mode 100644 drivers/nvme/host/bpf_ops.c
>  create mode 100644 include/linux/nvme-bpf.h
>  create mode 100644
> tools/testing/selftests/bpf/progs/bpf_nvme_simple.c



More information about the Linux-nvme mailing list