[PATCH v6 06/10] io_uring/rw: add support to send metadata along with read/write

Pavel Begunkov asml.silence at gmail.com
Mon Nov 11 17:32:58 PST 2024


On 11/10/24 18:36, Kanchan Joshi wrote:
> On 11/7/2024 10:53 PM, Pavel Begunkov wrote:
> 
>> Let's say we have 3 different attributes META_TYPE{1,2,3}.
>>
>> How are they placed in an SQE?
>>
>> meta1 = (void *)get_big_sqe(sqe);
>> meta2 = meta1 + sizeof(?); // sizeof(struct meta1_struct)
>> meta3 = meta2 + sizeof(struct meta2_struct);
> 
> Not necessary to do this kind of additions and think in terms of
> sequential ordering for the extra information placed into
> primary/secondary SQE.
> 
> Please see v8:
> https://lore.kernel.org/io-uring/20241106121842.5004-7-anuj20.g@samsung.com/
> 
> It exposes a distinct flag (sqe->ext_cap) for each attribute/cap, and
> userspace should place the corresponding information where kernel has
> mandated.
> 
> If a particular attribute (example write-hint) requires <20b of extra
> information, we should just place that in first SQE. PI requires more so
> we are placing that into second SQE.
> 
> When both PI and write-hint flags are specified by user they can get
> processed fine without actually having to care about above
> additions/ordering.

Ok, this option is to statically define a place in SQE for each
meta type. The problem is that we can't place everything into
an SQE, and the next big meta would need to be a user pointer,
at which point copy_from_user() is expensive again and we need
to invent something new. PI becomes a special case, most likely
handled in a special way, and either becomes one of few "optimised"
or forces for nothing its users into SQE128 (with all additional
costs) when it could've been aligned with other later meta types.

>> Structures are likely not fixed size (?). At least the PI looks large
>> enough to force everyone to be just aliased to it.
>>
>> And can the user pass first meta2 in the sqe and then meta1?
> 
> Yes. Just set the ext_cap flags without bothering about first/second.
> User can pass either or both, along with the corresponding info. Just
> don't have to assume specific placement into SQE.
> 
> 
>> meta2 = (void *)get_big_sqe(sqe);
>> meta1 = meta2 + sizeof(?); // sizeof(struct meta2_struct)
>>
>> If yes, how parsing should look like? Does the kernel need to read each
>> chunk's type and look up its size to iterate to the next one?
> 
> We don't need to iterate if we are not assuming any ordering.
> 
>> If no, what happens if we want to pass meta2 and meta3, do they start
>> from the big_sqe?
> 
> The one who adds the support for meta2/meta3 in kernel decides where to
> place them within first/second SQE or get them fetched via a pointer
> from userspace.
> 
>> How do we pass how many of such attributes is there for the request?
> 
> ext_cap allows to pass 16 cap/attribute flags. Maybe all can or can not
> be passed inline in SQE, but I have no real visibility about the space
> requirement of future users.

I like ext_cap, if not in the current form / API, then as a user
hint - quick map of what meta types are passed.

-- 
Pavel Begunkov



More information about the Linux-nvme mailing list