[PATCH 17/17] nvme: enable non-inline passthru commands

Kanchan Joshi joshiiitr at gmail.com
Sun Mar 27 21:44:13 PDT 2022


> >I disagree.  Reusing the same opcode and/or structure for something
> >fundamentally different creates major confusion.  Don't do it.
>
> Ok. If you are open to take new opcode/struct route, that is all we
> require to pair with big-sqe and have this sorted. How about this -
>
> +/* same as nvme_passthru_cmd64 but expecting result field to be pointer */
> +struct nvme_passthru_cmd64_ind {
> +       __u8    opcode;
> +       __u8    flags;
> +       __u16   rsvd1;
> +       __u32   nsid;
> +       __u32   cdw2;
> +       __u32   cdw3;
> +       __u64   metadata;
> +       __u64   addr;
> +       __u32   metadata_len;
> +       union {
> +               __u32   data_len; /* for non-vectored io */
> +               __u32   vec_cnt; /* for vectored io */
> +       };
> +       __u32   cdw10;
> +       __u32   cdw11;
> +       __u32   cdw12;
> +       __u32   cdw13;
> +       __u32   cdw14;
> +       __u32   cdw15;
> +       __u32   timeout_ms;
> +       __u32   rsvd2;
> +       __u64   presult; /* pointer to result */
> +};
> +
>  #define nvme_admin_cmd nvme_passthru_cmd
>
> +#define NVME_IOCTL_IO64_CMD_IND        _IOWR('N', 0x50, struct nvme_passthru_cmd64_ind)
>
> Not heavy on code-volume too, because only one statement (updating
> result) changes and we reuse everything else.
>
> >> >From all that we discussed, maybe the path forward could be this:
> >> - inline-cmd/big-sqe is useful if paired with big-cqe. Drop big-sqe
> >> for now if we cannot go the big-cqe route.
> >> - use only indirect-cmd as this requires nothing special, just regular
> >> sqe and cqe. We can support all passthru commands with a lot less
> >> code. No new ioctl in nvme, so same semantics. For common commands
> >> (i.e. read/write) we can still avoid updating the result (put_user
> >> cost will go).
> >>
> >> Please suggest if we should approach this any differently in v2.
> >
> >Personally I think larger SQEs and CQEs are the only sensible interface
> >here.  Everything else just fails like a horrible hack I would not want
> >to support in NVMe.
>
> So far we have gathered three choices:
>
> (a) big-sqe + new opcode/struct in nvme
> (b) big-sqe + big-cqe
> (c) regular-sqe + regular-cqe
>
> I can post a RFC on big-cqe work if that is what it takes to evaluate
> clearly what path to take. But really, the code is much more compared
> to choice (a) and (c). Differentiating one CQE with another does not
> seem very maintenance-friendly, particularly in liburing.
>
> For (c), I did not get what part feels like horrible hack.
> It is same as how we do sync passthru - read passthru command from
> user-space memory, and update result into that on completion.
> But yes, (a) seems like the best option to me.

Thinking a bit more on "(b) big-sqe + big-cqe". Will that also require
a new ioctl (other than NVME_IOCTL_IO64_CMD) in nvme? Because
semantics will be slightly different (i.e. not updating the result
inside the passthrough command but sending it out-of-band to
io_uring). Or am I just overthinking it.



More information about the Linux-nvme mailing list