[PATCH v7 06/10] io_uring/rw: add support to send metadata along with read/write

Christoph Hellwig hch at lst.de
Tue Nov 5 08:00:51 PST 2024


On Tue, Nov 05, 2024 at 09:21:27PM +0530, Kanchan Joshi wrote:
> Can add the documentation (if this version is palatable for Jens/Pavel), 
> but this was discussed in previous iteration:
> 
> 1. Each meta type may have different space requirement in SQE.
> 
> Only for PI, we need so much space that we can't fit that in first SQE. 
> The SQE128 requirement is only for PI type.
> Another different meta type may just fit into the first SQE. For that we 
> don't have to mandate SQE128.

Ok, I'm really confused now.  The way I understood Anuj was that this
is NOT about block level metadata, but about other uses of the big SQE.

Which version is right?  Or did I just completely misunderstand Anuj?

> 2. If two meta types are known not to co-exist, they can be kept in the 
> same place within SQE. Since each meta-type is a flag, we can check what 
> combinations are valid within io_uring and throw the error in case of 
> incompatibility.

And this sounds like what you refer to is not actually block metadata
as in this patchset or nvme, (or weirdly enough integrity in the block
layer code).

> 3. Previous version was relying on SQE128 flag. If user set the ring 
> that way, it is assumed that PI information was sent.
> This is more explicitly conveyed now - if user passed META_TYPE_PI flag, 
> it has sent the PI. This comment in the code:
> 
> +       /* if sqe->meta_type is META_TYPE_PI, last 32 bytes are for PI */
> +       union {
> 
> If this flag is not passed, parsing of second SQE is skipped, which is 
> the current behavior as now also one can send regular (non pi) 
> read/write on SQE128 ring.

And while I don't understand how this threads in with the previous
statements, this makes sense.  If you only want to send a pointer (+len)
to metadata you can use the normal 64-byte SQE.  If you want to send
a PI tuple you need SEQ128.  Is that what the various above statements
try to express?  If so the right API to me would be to have two flags:

 - a flag that a pointer to metadata is passed.  This can work with
   a 64-bit SQE.
 - another flag that a PI tuple is passed.  This requires a 128-byte
   and also the previous flag.


> 
> 
> 
> 
> 
---end quoted text---



More information about the Linux-nvme mailing list