[PATCH 05/17] nvme: wire-up support for async-passthru on char-device.

Christoph Hellwig hch at lst.de
Tue Mar 15 01:54:10 PDT 2022


On Mon, Mar 14, 2022 at 09:53:56PM +0530, Kanchan Joshi wrote:
>>> +struct nvme_uring_cmd_pdu {
>>> +	u32 meta_len;
>>> +	union {
>>> +		struct bio *bio;
>>> +		struct request *req;
>>> +	};
>>> +	void *meta; /* kernel-resident buffer */
>>> +	void __user *meta_buffer;
>>> +} __packed;
>>
>> Why is this marked __packed?
> Did not like doing it, but had to.
> If not packed, this takes 32 bytes of space. While driver-pdu in struct
> io_uring_cmd can take max 30 bytes. Packing nvme-pdu brought it down to
> 28 bytes, which fits and gives 2 bytes back.

What if you move meta_len to the end?  Even if we need the __packed
that will avoid all the unaligned access to pointers, which on some
architectures will crash the kernel.

> And on moving meta elements outside the driver, my worry is that it
> reduces scope of uring-cmd infra and makes it nvme passthru specific.
> At this point uring-cmd is still generic async ioctl/fsctl facility
> which may find other users (than nvme-passthru) down the line. Organization 
> of fields within "struct io_uring_cmd" is around the rule
> that a field is kept out (of 28 bytes pdu) only if is accessed by both
> io_uring and driver. 

We have plenty of other interfaces of that kind.  Sockets are one case
already, and regular read/write with protection information will be
another one.  So having some core infrastrucure for "secondary data"
seems very useful.

> I see, so there is room for adding some efficiency.
> Hope it will be ok if I carry this out as a separate effort.
> Since this is about touching blk_mq_complete_request at its heart, and
> improving sync-direct-io, this does not seem best-fit and slow this
> series down.

I really rather to this properly.  Especially as the current effort
adds new exported interfaces.

> Deferring by ipi or softirq never occured. Neither for block nor for
> char. Softirq is obvious since I was not running against scsi (or nvme with
> single queue). I could not spot whether this is really a overhead, at
> least for nvme.

This tends to kick in if you have less queues than cpu cores.  Quite
command with either a high core cound or a not very high end nvme
controller.



More information about the Linux-nvme mailing list