[PATCH v7 06/10] io_uring/rw: add support to send metadata along with read/write
Kanchan Joshi
joshi.k at samsung.com
Tue Nov 5 22:00:45 PST 2024
On 11/6/2024 10:59 AM, Christoph Hellwig wrote:
> On Tue, Nov 05, 2024 at 09:23:19AM -0700, Keith Busch wrote:
>>>> The SQE128 requirement is only for PI type.
>>>> Another different meta type may just fit into the first SQE. For that we
>>>> don't have to mandate SQE128.
>>>
>>> Ok, I'm really confused now. The way I understood Anuj was that this
>>> is NOT about block level metadata, but about other uses of the big SQE.
>>>
>>> Which version is right? Or did I just completely misunderstand Anuj?
>>
>> Let's not call this "meta_type". Can we use something that has a less
>> overloaded meaning, like "sqe_extended_capabilities", or "ecap", or
>> something like that.
>
> So it's just a flag that a 128-byte SQE is used?
No, this flag tells that user decided to send PI in SQE. And this flag
is kept into first half of SQE (which always exists). This is just
additional detail/requirement that PI fields are kept into SQE128 (which
is opt in).
> Don't we know that
> implicitly from the sq?
Yes, we have a separate ring-level flag for that.
#define IORING_SETUP_SQE128 (1U << 10) /* SQEs are 128 byte */
>>> - a flag that a pointer to metadata is passed. This can work with
>>> a 64-bit SQE.
>>> - another flag that a PI tuple is passed. This requires a 128-byte
>>> and also the previous flag.
>>
>> I don't think anything done so far aligns with what Pavel had in mind.
>> Let me try to lay out what I think he's going for. Just bare with me,
>> this is just a hypothetical example.
>>
>> This patch adds a PI extension.
>> Later, let's say write streams needs another extenion.
>> Then key per-IO wants another extention.
>> Then someone else adds wizbang-awesome-feature extention.
>>
>> Let's say you have device that can do all 4, or any combination of them.
>> Pavel wants a solution that is future proof to such a scenario. So not
>> just a single new "meta_type" with its structure, but a list of types in
>> no particular order, and their structures.
>
> But why do we need the type at all? Each of them obvious needs two
> things:
>
> 1) some space to actually store the extra fields
> 2) a flag that the additional values are passed
Yes, this is exactly how the patch is implemented. 'meta-type' is the
flag that tells additional values (representing PI info) are passed.
> any single value is not going to help with supporting arbitrary
> combinations,
Not a single value. It is a u16 field, so it can represent 16 possible
flags.
This part in the patch:
+enum io_uring_sqe_meta_type_bits {
+ META_TYPE_PI_BIT,
+ /* not a real meta type; just to make sure that we don't overflow */
+ META_TYPE_LAST_BIT,
+};
+
+/* meta type flags */
+#define META_TYPE_PI (1U << META_TYPE_PI_BIT)
For future users, one can add things like META_TYPE_KPIO_BIT or
META_TYPE_WRITE_HINT_BIT if they needed to send extra information in SQE.
Note that these users may not require SQE128. It all depends on how much
of extra information is required. We still have some free space in first
SQE.
because well, you can can mix and match, and you need
> space for all them even if you are not using all of them.
mix-and-match can be detected with the above flags.
And in case two types don't go well together, that also. And for such
types we can reuse the space.
More information about the Linux-nvme
mailing list