[PATCH v9 06/11] io_uring: introduce attributes for read/write and PI support
Pavel Begunkov
asml.silence at gmail.com
Thu Nov 21 05:45:58 PST 2024
On 11/20/24 17:35, Darrick J. Wong wrote:
> On Fri, Nov 15, 2024 at 06:04:01PM +0000, Matthew Wilcox wrote:
>> On Thu, Nov 14, 2024 at 01:09:44PM +0000, Pavel Begunkov wrote:
>>> With SQE128 it's also a problem that now all SQEs are 128 bytes regardless
>>> of whether a particular request needs it or not, and the user will need
>>> to zero them for each request.
>>
>> The way we handled this in NVMe was to use a bit in the command that
>> was called (iirc) FUSED, which let you use two consecutive entries for
>> a single command.
>>
>> Some variant on that could surely be used for io_uring. Perhaps a
>> special opcode that says "the real opcode is here, and this is a two-slot
>> command". Processing gets a little spicy when one slot is the last in
>> the buffer and the next is the the first in the buffer, but that's a SMOP.
>
> I like willy's suggestion -- what's the difficulty in having a SQE flag
> that says "...and keep going into the next SQE"? I guess that
> introduces the problem that you can no longer react to the observation
> of 4 new SQEs by creating 4 new contexts to process those SQEs and throw
> all 4 of them at background threads, since you don't know how many IOs
> are there.
Some variation on "variable size SQE" was discussed back in the day
as an option instead of SQE128. I don't remember why it was refused
exactly, but I'd think it was exactly the "spicy" moment Matthew
mentioned, especially since nvme passthrough was spanning its payload
across both parts of the SQE.
I'm pretty sure I can find more than a couple of downsides, like for
it to be truly generic you need a flag in each SQE and finding a bit
is not that easy, and also in terms of some overhead to everyone else
while this extension is not even needed. By the end of the day, the
main concern is how it's placed and not where specifically,
SQE / user memory / etc.
--
Pavel Begunkov
More information about the Linux-nvme
mailing list