[PATCH RFC 0/5] nvme: Controller Data Queue (CDQ) support

Joel Granados joel.granados at kernel.org
Tue Apr 28 13:13:44 PDT 2026


On Tue, Apr 28, 2026 at 02:57:24AM +0000, Chaitanya Kulkarni wrote:
> On 4/24/26 4:37 AM, Joel Granados wrote:
> > This RFC implements Controller Data Queue (CDQ) support in the NVMe
> > driver, a variation of my original RFC sent last July [2]. It exposes an
> > ioctl interface for userspace to create, configure, and delete CDQs
> > backed by DMA-mapped user memory with eventfd notification. In this
> > version I explore how the CDQ protocol logic might live outside the
> > kernel; the ioctl serves as a testing tool but is not necessarily the
> > final interface.
> >
> > This RFC exists within a broader goal, which is to enable NVMe namespace
> > migration. The timing feels right as hardware with CDQ capability
> > exists, NVMe fully specifies the feature and there is growing interest
> > in Live Migration which by extension includes CDQ.
> >
> > There is however, no clear consensus on how NVMe Live Migration should
> > land in the Linux kernel. The 2022 discussion [1] explored a VFIO-based
> > approach but reached no conclusion, likely because the specification was
> > not yet mature.
> >
> Above description has significant gaps and there is already a common
> consensus about this work. Here is a detailed status [1] to avoid any confusion
> for the work that is actively going on since AUG 2022.
> 
> The correct description is :-
> 
> Based on [1] the direction of this work has already reached a definitive
> conclusion, we will be sorting out remaining gaps between spec and
> upstream kernel in the NVMe LM SW WG approved by NVMe consortium.
> That also includes potentially fixing gaps we may or may not find in spec.
> Hence all the development for NVMe Live Migration is moved under NVMe Live
> Migration Software Working Group.
> 
> This effort has involved active and continuous contributions over the past
> three years in the NVMe TWG and the upstream Linux kernel, with participation
> from multiple organizations, see below all posted RFC where this work started.
> 
> This multiyear effort — from us publishing a research paper, sharing the design
> across the industry, and working on the specification definition to align with
> the kernel implementation, to forming the NVMe LM SW WG so everyone can
> participate and share the work — has now reached its final phase.
> 
> This work has been visible to upstream maintainers and industry participants,
> and has progressed with their awareness, approval and active
> involvement. I'm following their guidance and feedback actively.
> 
> Hence there is no need to host a session at LSFMM this year, as we
> already hosted the session for this in 2023 LSFMM (see below) and has
> everything we need to upstream the work and everyone is aware about the
> status, see [1].
> 
>  From the start of this project I've collaborated with all the necessary
> entities involved in this and make sure everyone is included.
> I'll be happy to collaborate with you as well.
> 
> -ck

So you are saying that there is an agreement on how NVMe live migration
will be upstreamed into the Linux Kernel? -Pls correct me if I have
misunderstood-. I did not know that it had been reached:
  1. What was decided?
  2. How is namespace migration included?
  3. Did the nvme_submit_vf_cmd function make the cut?
  4. Is there a public exchange that describes the agreement (I may have
     missed it)?

I don't have a particular preference for any specific solution. My
interest is in understanding how NVMe namespace migration (migration of
actual data, not just controller state) fits into the picture.

AFAIK, The work up until now focuses on controller state migration use
cases (and rightly so, as that is a valuable use case). However, I'm
still missing a discussion on namespace migration which has only been
mentioned sporadically.

I believe that it makes sense to have the session in LSF as it is a good
opportunity to 
  1. Discuss how namespace migration and CDQ fit into the upstreaming
     effort
  2. Clarify what the agreed approach for LM is (This is still unclear,
     even with this e-mail)

> 
> [1] Status So far :-
> 
> 1. We published a detailed research paper on NVMe Kernel Live
>     Migration in collaboration with industry experts AUG 2022. This was
>     presented at SNIA SDC 2022, where we received valuable feedback
>     from a broad audience. The design presented there was generally well
>     received and helped guide subsequent standardization efforts.
> 
> 2. Based on that work, we submitted an RFC to the kernel community in
>     Dec 2022 and invited feedback:
>     [https://patchew.org/linux/20221206055816.292304-1-lei.rao@intel.com/]
>     
>     We have also conducted NVMe Live Migration session at LSFMM 2023 to align
>     with the community, Dr. Stephen Bates conducted an awesome session :-
>     https://lists.infradead.org/pipermail/linux-nvme/2023-February/037786.html
>     We didn't receive any feedback from you or your team :(, hope I didn't miss
>     anything.
> 
> 3. Incorporating the feedback received, we proceeded with specification
>     work and standardized Phase 1. This effort aimed to ensure alignment
>     between the design presented at SDC 2022 and the evolving NVMe spec,
>     as well as the kernel-side implementation. I was one of the leader in
>     standardizing that work and enhancing the design for phase 1.
>     
> 4. After completing Phase 1, we shared an updated RFC aligned with the
>     specification ratified TP 4159 and received input from maintainers:
> 
>     [https://lists.infradead.org/pipermail/linux-nvme/2025-August/057717.html]
>     From RFC cover letter :-
>        4. Patch 0004 implements the TP4159 commands: Suspend, Resume,
>        Get Controller State, and Set Controller State. It also includes
>        debug helpers and command parsing logic.
>     We didn't receive any feedback from you or your team :(, hope I didn't miss
>     anything.
> 
> 5. Given the complexity and compliance requirements of this feature, different
>     organizations initiated the NVMe Live Migration Software Working Group within
>     the NVMe consortium and NVMe consortium has approved NVMe LM SW WG.
>     The goal is to develop an end-to-end solution within the consortium before
>     broader upstream kernel engagement. We have a growing group of organization
>     who is committed to do this work in the NVMe LM SW WG and then involve upstream.
>     NVMe consortium has already approved this NVMe LM SW WG and everyone is aware
>     that I'm co-coordinating the end to end design and development.
>     We didn't receive any feedback from you or your team :(, hope I didn't miss
>     anything.
> 
> > To move CDQ forward, I would like to understand where the LM logic belongs. I
> > currently see two options (of which I have no particular preference):
> >
> > 1. VFIO: Implement NVMe LM following the VFIO state machine, similar to what
> >     was proposed in 2022.
> > 2. VM manager interface: Bypass VFIO and implement LM logic in the interface
> >     between the VM manager (e.g., QEMU) and the NVMe driver.
> >
> > One aspect that has not received much attention in previous discussions
> > is namespace migration as prior work focused on migrating state and not
> > the actual data. Migrating potential terabytes is IMO a distinct use
> > case worth considering. LSF/MM/BPF is in a week. I hope this series
> > encourages folks to revisit their positions, give their opinions and set
> > the stage for face2face discussions.
> >
> > Best
> >
> > PS: I'm including the regular NVMe contacts and the folks that seemed to
> > have strong opinions in [2]. I always find it difficult to decide who to
> > include in these so let me know if you want to be removed in the future
> > or if I have missed someone.
> >
> > [1] https://lore.kernel.org/20221206055816.292304-1-lei.rao@intel.com
> > [2] https://lore.kernel.org/20250714-jag-cdq-v1-0-01e027d256d5@kernel.org
> 
> You have not documented the RFC that I posted for 4159 above which is creating
> lot more confusion.
That is my bad. After your recap, I see that I had missed part of the
story. I appreciate you filling in the gaps - I'll keep this timeline in
mind going forward.

FYI: Adding Klaus Jensen to the CC as he might be able to add more
nuance to the discussion

Best
> 
> >
> > Signed-off-by: Joel Granados <joel.granados at kernel.org>
> > ---
> > Joel Granados (5):
> >        nvme: Add CDQ data structures to nvme spec header
> >        nvme: Add CDQ data structures to host driver
> >        nvme: Add NVME_AER_ONE_SHOT callback handler
> >        nvme: Implement CDQ core functionality
> >        nvme: Add CDQ ioctl interface
> >
> >   drivers/nvme/host/core.c        | 312 ++++++++++++++++++++++++++++++++++++++++
> >   drivers/nvme/host/ioctl.c       |  53 ++++++-
> >   drivers/nvme/host/nvme.h        |  20 +++
> >   include/linux/nvme.h            |  50 ++++++-
> >   include/uapi/linux/nvme_ioctl.h |  29 ++++
> >   5 files changed, 462 insertions(+), 2 deletions(-)
> > ---
> > base-commit: 028ef9c96e96197026887c0f092424679298aae8
> > change-id: 20260424-jag-cdq-lkml-cd9b7c79983d
> >
> > Best regards,
> 
> 
> -ck
> 
> 

-- 

Joel Granados
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20260428/32ee4861/attachment.sig>


More information about the Linux-nvme mailing list