[PATCH RFC 0/5] nvme: Controller Data Queue (CDQ) support
Chaitanya Kulkarni
chaitanyak at nvidia.com
Mon Apr 27 19:57:24 PDT 2026
On 4/24/26 4:37 AM, Joel Granados wrote:
> This RFC implements Controller Data Queue (CDQ) support in the NVMe
> driver, a variation of my original RFC sent last July [2]. It exposes an
> ioctl interface for userspace to create, configure, and delete CDQs
> backed by DMA-mapped user memory with eventfd notification. In this
> version I explore how the CDQ protocol logic might live outside the
> kernel; the ioctl serves as a testing tool but is not necessarily the
> final interface.
>
> This RFC exists within a broader goal, which is to enable NVMe namespace
> migration. The timing feels right as hardware with CDQ capability
> exists, NVMe fully specifies the feature and there is growing interest
> in Live Migration which by extension includes CDQ.
>
> There is however, no clear consensus on how NVMe Live Migration should
> land in the Linux kernel. The 2022 discussion [1] explored a VFIO-based
> approach but reached no conclusion, likely because the specification was
> not yet mature.
>
Above description has significant gaps and there is already a common
consensus about this work. Here is a detailed status [1] to avoid any confusion
for the work that is actively going on since AUG 2022.
The correct description is :-
Based on [1] the direction of this work has already reached a definitive
conclusion, we will be sorting out remaining gaps between spec and
upstream kernel in the NVMe LM SW WG approved by NVMe consortium.
That also includes potentially fixing gaps we may or may not find in spec.
Hence all the development for NVMe Live Migration is moved under NVMe Live
Migration Software Working Group.
This effort has involved active and continuous contributions over the past
three years in the NVMe TWG and the upstream Linux kernel, with participation
from multiple organizations, see below all posted RFC where this work started.
This multiyear effort — from us publishing a research paper, sharing the design
across the industry, and working on the specification definition to align with
the kernel implementation, to forming the NVMe LM SW WG so everyone can
participate and share the work — has now reached its final phase.
This work has been visible to upstream maintainers and industry participants,
and has progressed with their awareness, approval and active
involvement. I'm following their guidance and feedback actively.
Hence there is no need to host a session at LSFMM this year, as we
already hosted the session for this in 2023 LSFMM (see below) and has
everything we need to upstream the work and everyone is aware about the
status, see [1].
From the start of this project I've collaborated with all the necessary
entities involved in this and make sure everyone is included.
I'll be happy to collaborate with you as well.
-ck
[1] Status So far :-
1. We published a detailed research paper on NVMe Kernel Live
Migration in collaboration with industry experts AUG 2022. This was
presented at SNIA SDC 2022, where we received valuable feedback
from a broad audience. The design presented there was generally well
received and helped guide subsequent standardization efforts.
2. Based on that work, we submitted an RFC to the kernel community in
Dec 2022 and invited feedback:
[https://patchew.org/linux/20221206055816.292304-1-lei.rao@intel.com/]
We have also conducted NVMe Live Migration session at LSFMM 2023 to align
with the community, Dr. Stephen Bates conducted an awesome session :-
https://lists.infradead.org/pipermail/linux-nvme/2023-February/037786.html
We didn't receive any feedback from you or your team :(, hope I didn't miss
anything.
3. Incorporating the feedback received, we proceeded with specification
work and standardized Phase 1. This effort aimed to ensure alignment
between the design presented at SDC 2022 and the evolving NVMe spec,
as well as the kernel-side implementation. I was one of the leader in
standardizing that work and enhancing the design for phase 1.
4. After completing Phase 1, we shared an updated RFC aligned with the
specification ratified TP 4159 and received input from maintainers:
[https://lists.infradead.org/pipermail/linux-nvme/2025-August/057717.html]
From RFC cover letter :-
4. Patch 0004 implements the TP4159 commands: Suspend, Resume,
Get Controller State, and Set Controller State. It also includes
debug helpers and command parsing logic.
We didn't receive any feedback from you or your team :(, hope I didn't miss
anything.
5. Given the complexity and compliance requirements of this feature, different
organizations initiated the NVMe Live Migration Software Working Group within
the NVMe consortium and NVMe consortium has approved NVMe LM SW WG.
The goal is to develop an end-to-end solution within the consortium before
broader upstream kernel engagement. We have a growing group of organization
who is committed to do this work in the NVMe LM SW WG and then involve upstream.
NVMe consortium has already approved this NVMe LM SW WG and everyone is aware
that I'm co-coordinating the end to end design and development.
We didn't receive any feedback from you or your team :(, hope I didn't miss
anything.
> To move CDQ forward, I would like to understand where the LM logic belongs. I
> currently see two options (of which I have no particular preference):
>
> 1. VFIO: Implement NVMe LM following the VFIO state machine, similar to what
> was proposed in 2022.
> 2. VM manager interface: Bypass VFIO and implement LM logic in the interface
> between the VM manager (e.g., QEMU) and the NVMe driver.
>
> One aspect that has not received much attention in previous discussions
> is namespace migration as prior work focused on migrating state and not
> the actual data. Migrating potential terabytes is IMO a distinct use
> case worth considering. LSF/MM/BPF is in a week. I hope this series
> encourages folks to revisit their positions, give their opinions and set
> the stage for face2face discussions.
>
> Best
>
> PS: I'm including the regular NVMe contacts and the folks that seemed to
> have strong opinions in [2]. I always find it difficult to decide who to
> include in these so let me know if you want to be removed in the future
> or if I have missed someone.
>
> [1] https://lore.kernel.org/20221206055816.292304-1-lei.rao@intel.com
> [2] https://lore.kernel.org/20250714-jag-cdq-v1-0-01e027d256d5@kernel.org
You have not documented the RFC that I posted for 4159 above which is creating
lot more confusion.
>
> Signed-off-by: Joel Granados <joel.granados at kernel.org>
> ---
> Joel Granados (5):
> nvme: Add CDQ data structures to nvme spec header
> nvme: Add CDQ data structures to host driver
> nvme: Add NVME_AER_ONE_SHOT callback handler
> nvme: Implement CDQ core functionality
> nvme: Add CDQ ioctl interface
>
> drivers/nvme/host/core.c | 312 ++++++++++++++++++++++++++++++++++++++++
> drivers/nvme/host/ioctl.c | 53 ++++++-
> drivers/nvme/host/nvme.h | 20 +++
> include/linux/nvme.h | 50 ++++++-
> include/uapi/linux/nvme_ioctl.h | 29 ++++
> 5 files changed, 462 insertions(+), 2 deletions(-)
> ---
> base-commit: 028ef9c96e96197026887c0f092424679298aae8
> change-id: 20260424-jag-cdq-lkml-cd9b7c79983d
>
> Best regards,
-ck
More information about the Linux-nvme
mailing list