[PATCH v5 06/10] io_uring/rw: add support to send metadata along with read/write

Kanchan Joshi joshi.k at samsung.com
Tue Oct 29 22:05:19 PDT 2024


On 10/30/2024 4:54 AM, Keith Busch wrote:
> On Tue, Oct 29, 2024 at 09:53:58PM +0530, Anuj Gupta wrote:
>> This patch adds the capability of sending metadata along with read/write.
>> A new meta_type field is introduced in SQE which indicates the type of
>> metadata being passed. This meta is represented by a newly introduced
>> 'struct io_uring_meta_pi' which specifies information such as flags,buffer
>> length,seed and apptag. Application sets up a SQE128 ring, prepares
>> io_uring_meta_pi within the second SQE.
>> The patch processes the user-passed information to prepare uio_meta
>> descriptor and passes it down using kiocb->private.
>>
>> Meta exchange is supported only for direct IO.
>> Also vectored read/write operations with meta are not supported
>> currently.
> 
> It looks like it is reasonable to add support for fixed buffers too.
> There would be implications for subsequent patches, mostly patch 10, but
> it looks like we can do that.

Fixed buffers for data continues to be supported with this.
Do you mean fixed buffers for metadata?
We can take that as an incremental addition outside of this series which 
is already touching various subsystems (io_uring, block, nvme, scsi, fs).

> Anyway, this patch mostly looks okay to me. I don't know about the whole
> "meta_type" thing. My understanding from Pavel was wanting a way to
> chain command specific extra options.

Right. During LSFMM, he mentioned Btrfs needed to send extra stuff with 
read/write.
But in general, this is about seeing metadata as a generic term to 
encode extra information into io_uring SQE.
It may not be very uncommon that people will have the need to send extra 
stuff with read/write and add specific processing for that. And 
SQE->meta_type helps to isolate all such processing from the common case 
when no extra stuff is sent.

if (sqe->meta_type)
{
	if (type1(sqe->meta_type))
		process(type1);
	if (type2(sqe>meta_type))
		process(type1);
}

  For example, userspace metadata
> and write hints, and this doesn't look like it can be extended to do
> that.

It can be. And in past I used that to represent different types of write 
hints.
Just that in the current version, write hints are being sent without any 
type.




More information about the Linux-nvme mailing list