[PATCH 05/17] nvme: wire-up support for async-passthru on char-device.

Christoph Hellwig hch at lst.de
Thu Mar 10 23:01:48 PST 2022


On Tue, Mar 08, 2022 at 08:50:53PM +0530, Kanchan Joshi wrote:
> +/*
> + * This overlays struct io_uring_cmd pdu.
> + * Expect build errors if this grows larger than that.
> + */
> +struct nvme_uring_cmd_pdu {
> +	u32 meta_len;
> +	union {
> +		struct bio *bio;
> +		struct request *req;
> +	};
> +	void *meta; /* kernel-resident buffer */
> +	void __user *meta_buffer;
> +} __packed;

Why is this marked __packed?

In general I'd be much more happy if the meta elelements were a
io_uring-level feature handled outside the driver and typesafe in
struct io_uring_cmd, with just a normal private data pointer for the
actual user, which would remove all the crazy casting.

> +static void nvme_end_async_pt(struct request *req, blk_status_t err)
> +{
> +	struct io_uring_cmd *ioucmd = req->end_io_data;
> +	struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);
> +	/* extract bio before reusing the same field for request */
> +	struct bio *bio = pdu->bio;
> +
> +	pdu->req = req;
> +	req->bio = bio;
> +	/* this takes care of setting up task-work */
> +	io_uring_cmd_complete_in_task(ioucmd, nvme_pt_task_cb);

This is a bit silly.  First we defer the actual request I/O completion
from the block layer to a different CPU or softirq and then we have
another callback here.  I think it would be much more useful if we
could find a way to enhance blk_mq_complete_request so that it could
directly complet in a given task.  That would also be really nice for
say normal synchronous direct I/O.

> +	if (ioucmd) { /* async dispatch */
> +		if (cmd->common.opcode == nvme_cmd_write ||
> +				cmd->common.opcode == nvme_cmd_read) {

No we can't just check for specific commands in the passthrough handler.

> +			nvme_setup_uring_cmd_data(req, ioucmd, meta, meta_buffer,
> +					meta_len);
> +			blk_execute_rq_nowait(req, 0, nvme_end_async_pt);
> +			return 0;
> +		} else {
> +			/* support only read and write for now. */
> +			ret = -EINVAL;
> +			goto out_meta;
> +		}

Pleae always handle error in the first branch and don't bother with an
else after a goto or return.

> +static int nvme_ns_async_ioctl(struct nvme_ns *ns, struct io_uring_cmd *ioucmd)
> +{
> +	int ret;
> +
> +	BUILD_BUG_ON(sizeof(struct nvme_uring_cmd_pdu) > sizeof(ioucmd->pdu));
> +
> +	switch (ioucmd->cmd_op) {
> +	case NVME_IOCTL_IO64_CMD:
> +		ret = nvme_user_cmd64(ns->ctrl, ns, NULL, ioucmd);
> +		break;
> +	default:
> +		ret = -ENOTTY;
> +	}
> +
> +	if (ret >= 0)
> +		ret = -EIOCBQUEUED;

That's a weird way to handle the returns.  Just return -EIOCBQUEUED
directly from the handler (which as said before should be split from
the ioctl handler anyway).



More information about the Linux-nvme mailing list