[PATCH v9 06/11] io_uring: introduce attributes for read/write and PI support
Pavel Begunkov
asml.silence at gmail.com
Fri Nov 15 16:00:43 PST 2024
On 11/14/24 10:45, Anuj Gupta wrote:
> Add the ability to pass additional attributes along with read/write.
> Application can populate an array of 'struct io_uring_attr_vec' and pass
> its address using the SQE field:
> __u64 attr_vec_addr;
>
> Along with number of attributes using:
> __u8 nr_attr_indirect;
>
> Overall 16 attributes are allowed and currently one attribute
> 'ATTR_TYPE_PI' is supported.
Why only 16? It's possible that might need more, 256 would
be a safer choice and fits into u8. I don't think you even
need to commit to a number, it should be ok to add more as
long as it fits into the given types (u8 above). It can also
be u16 as well.
> With PI attribute, userspace can pass following information:
> - flags: integrity check flags IO_INTEGRITY_CHK_{GUARD/APPTAG/REFTAG}
> - len: length of PI/metadata buffer
> - addr: address of metadata buffer
> - seed: seed value for reftag remapping
> - app_tag: application defined 16b value
In terms of flexibility I like it apart from small nits,
but the double indirection could be a bit inefficient,
this thing:
struct pi_attr pi = {};
attr_array = { &pi, ... };
sqe->attr_addr = attr_array;
So maybe we should just flatten it? An attempt to pseudo
code it to understand what it entails is below. Certainly
buggy and some handling is omitted, but should show the
idea.
// uapi/.../io_uring.h
struct sqe {
...
u64 attr_addr;
/* the total size of the array pointed by attr_addr */
u16 attr_size; /* max 64KB, more than enough */
}
struct io_attr_header {
/* bit mask of attributes passed, can be helpful in the future
* for optimising processing.
*/
u64 attr_type_map;
};
/* each attribute should start with a preamble */
struct io_uring_attr_preamble {
u16 attr_type;
};
// user space
struct PI_param {
struct io_attr_header header;
struct io_uring_attr_preamble preamble;
struct io_uring_attr_pi pi;
};
struct PI_param p = {};
p.header.map = 1 << ATTR_TYPE_PI;
p.preamble.type = ATTR_TYPE_PI;
p.pi = {...};
sqe->attr_addr = &p;
sqe->attr_size = sizeof(p);
The holes b/w structures should be packed better. For the same
reason I don't like a separate preamble structure much, maybe it
should be embedded into the attribute structures, e.g.
struct io_uring_attr_pi {
u16 attr_type;
...
}
The user side looks ok to me, should be pretty straightforward
if the user can define a structure like PI_param, i.e. knows
at compilation time which attributes it wants to use.
// kernel space (current patch set, PI only)
addr = READ_ONCE(sqe->attr_addr);
if (addr) {
size = READ_ONCE(sqe->attr_size);
process_pi(addr, size);
}
process_pi(addr, size) {
struct PI_param p;
if (size != sizeof(PI_attr + struct attr_preamble + struct attr_header))
return -EINVAL;
copy_from_user(p, addr, sizeof(p));
if (p.preamble != ATTR_TYPE_PI)
return -EINVAL;
do_pi_setup(&p->pi);
}
This one is pretty simple as well. A bit more troublesome if
extended with many attributes, but it'd need additional handling
regardless:
process_pi(addr, size) {
if (size < sizeof(header + preamble))
return -EINVAL;
attr_array = malloc(size); // +caching by io_uring
copy_from_user(attr_array);
handle_attributes(attr_array, size);
}
handle_attributes(attr_array, size) {
struct io_attr_header *hdr = attr_array;
offset = sizeof(*hdr);
while (1) {
if (offset + sizeof(struct preamble) > size)
break;
struct preamble *pr = attr_array + offset;
if (pr->type > MAX_TYPES)
return -EINVAL;
attr_size = attr_sizes[pr->type];
if (offset + sizeof(preamble) + attr_size > size)
return -EINVAL;
offset += sizeof(preamble) + attr_size;
process_attr(pr->type, (void *)(pr + 1));
}
}
Some checks can probably be optimised by playing with the uapi
a bit.
attr_type_map is unused here, but I like the idea. I'd love
to see all actual attribute handling to move deeper into the
stack to those who actually need it, but that's for far
away undecided future. E.g.
io_uring_rw {
p = malloc();
copy_from_user(p, sqe->attr_addr);
kiocb->attributes = p;
}
block_do_read {
hdr = kiocb->attributes;
type_mask = /* all types block layer recognises */
if (hdr->attr_type_map & type_mask)
use_attributes();
}
copy_from_user can be optimised, I mentioned before, we can
have a pre-mapped area into which the indirection can point.
The infra is already in there and even used for passing
waiting arguments.
process_pi(addr, size) {
struct PI_param *p, __p;
if (some_flags & USE_REGISTERED_REGION) {
// Glorified p = ctx->ptr; with some checks
p = io_uring_get_mem(addr, size);
} else {
copy_from_user(__p, addr, sizeof(__p));
p = &__p;
}
...
}
In this case all reads would need to be READ_ONCE, but that
shouldn't be a problem. It might also optimise out the kmalloc
in the extended version.
--
Pavel Begunkov
More information about the Linux-nvme
mailing list