[PATCH 17/17] nvme: enable non-inline passthru commands
Kanchan Joshi
joshiiitr at gmail.com
Sun Mar 27 21:44:13 PDT 2022
> >I disagree. Reusing the same opcode and/or structure for something
> >fundamentally different creates major confusion. Don't do it.
>
> Ok. If you are open to take new opcode/struct route, that is all we
> require to pair with big-sqe and have this sorted. How about this -
>
> +/* same as nvme_passthru_cmd64 but expecting result field to be pointer */
> +struct nvme_passthru_cmd64_ind {
> + __u8 opcode;
> + __u8 flags;
> + __u16 rsvd1;
> + __u32 nsid;
> + __u32 cdw2;
> + __u32 cdw3;
> + __u64 metadata;
> + __u64 addr;
> + __u32 metadata_len;
> + union {
> + __u32 data_len; /* for non-vectored io */
> + __u32 vec_cnt; /* for vectored io */
> + };
> + __u32 cdw10;
> + __u32 cdw11;
> + __u32 cdw12;
> + __u32 cdw13;
> + __u32 cdw14;
> + __u32 cdw15;
> + __u32 timeout_ms;
> + __u32 rsvd2;
> + __u64 presult; /* pointer to result */
> +};
> +
> #define nvme_admin_cmd nvme_passthru_cmd
>
> +#define NVME_IOCTL_IO64_CMD_IND _IOWR('N', 0x50, struct nvme_passthru_cmd64_ind)
>
> Not heavy on code-volume too, because only one statement (updating
> result) changes and we reuse everything else.
>
> >> >From all that we discussed, maybe the path forward could be this:
> >> - inline-cmd/big-sqe is useful if paired with big-cqe. Drop big-sqe
> >> for now if we cannot go the big-cqe route.
> >> - use only indirect-cmd as this requires nothing special, just regular
> >> sqe and cqe. We can support all passthru commands with a lot less
> >> code. No new ioctl in nvme, so same semantics. For common commands
> >> (i.e. read/write) we can still avoid updating the result (put_user
> >> cost will go).
> >>
> >> Please suggest if we should approach this any differently in v2.
> >
> >Personally I think larger SQEs and CQEs are the only sensible interface
> >here. Everything else just fails like a horrible hack I would not want
> >to support in NVMe.
>
> So far we have gathered three choices:
>
> (a) big-sqe + new opcode/struct in nvme
> (b) big-sqe + big-cqe
> (c) regular-sqe + regular-cqe
>
> I can post a RFC on big-cqe work if that is what it takes to evaluate
> clearly what path to take. But really, the code is much more compared
> to choice (a) and (c). Differentiating one CQE with another does not
> seem very maintenance-friendly, particularly in liburing.
>
> For (c), I did not get what part feels like horrible hack.
> It is same as how we do sync passthru - read passthru command from
> user-space memory, and update result into that on completion.
> But yes, (a) seems like the best option to me.
Thinking a bit more on "(b) big-sqe + big-cqe". Will that also require
a new ioctl (other than NVME_IOCTL_IO64_CMD) in nvme? Because
semantics will be slightly different (i.e. not updating the result
inside the passthrough command but sending it out-of-band to
io_uring). Or am I just overthinking it.
More information about the Linux-nvme
mailing list