[PATCH v9 06/11] io_uring: introduce attributes for read/write and PI support

Pavel Begunkov asml.silence at gmail.com
Fri Nov 15 11:03:28 PST 2024


On 11/15/24 17:12, Christoph Hellwig wrote:
> On Fri, Nov 15, 2024 at 04:40:58PM +0000, Pavel Begunkov wrote:
>>> So?  If we have a strong enough requirement for something else we
>>> can triviall add another opcode.  Maybe we should just add different
>>> opcodes for read/write with metadata so that folks don't freak out
>>> about this?
>>
>> IMHO, PI is not so special to have a special opcode for it unlike
>> some more generic read/write with meta / attributes, but that one
>> would have same questions.
> 
> Well, apparently is one the hand hand not general enough that you
> don't want to give it SQE128 space, but you also don't want to give
> it an opcode.

Not like there are no other options. It can be user pointers,
and now we have infra to optimise it if copy_from_user is
expensive.

One thing that doesn't feel right is double indirection, i.e.
a uptr into an array of pointers, at least if IIUC from a quick
glance. I'll follow up on that.

> Maybe we just need make it uring_cmd to get out of these conflicting
> requirements.
> 
> Just to make it clear: I'm not a huge fan of a separate opcode or
> uring_cmd, but compared to the version in this patch it is much better.
> 
>> PI as a special case. And that's more of a problem of the static
>> placing from previous version, e.g. it wouldn't be a problem if in the
>> long run it becomes sth like:
>>
>> struct attr attr, *p;
>>
>> if (flags & META_IN_USE_SQE128)
>> 	p = sqe + 1;
>> else {
>> 	copy_from_user(&attr);
>> 	p = &attr;
>> }
>>
>> but that shouldn't be PI specific.
> 
> Why would anyone not use the SQE128 version?

!SQE128 with user pointer can easily be faster depending on the
ratio of requests that use SQE128 and don't. E.g. one PI read
following with a 100 of send/recv on average. copy_from_user
is not _that_ expensive and we're talking about zeroing an
extra never used afterwards cache line.

Though the main reason would be when you pass 2+ different
attributes and there is no space to put them in SQEs.

-- 
Pavel Begunkov



More information about the Linux-nvme mailing list