[PATCH v7 3/3] io_uring: enable per-io hinting capability

Pavel Begunkov asml.silence at gmail.com
Wed Oct 2 07:26:02 PDT 2024


On 9/30/24 19:13, Kanchan Joshi wrote:
> With F_SET_RW_HINT fcntl, user can set a hint on the file inode, and
> all the subsequent writes on the file pass that hint value down.
> This can be limiting for large files (and for block device) as all the
> writes can be tagged with only one lifetime hint value.
> Concurrent writes (with different hint values) are hard to manage.
> Per-IO hinting solves that problem.
> 
> Allow userspace to pass additional metadata in the SQE.
> The type of passed metadata is expressed by a new field
> 
> 	__u16 meta_type;

The new layout looks nicer, but let me elaborate on the previous
comment. I don't believe we should be restricting to only one
attribute per IO. What if someone wants to pass a lifetime hint
together with integrity information?

Instead, we might need something more extensible like an ability
to pass a list / array of typed attributes / meta information / hints
etc. An example from networking I gave last time was control messages,
i.e. cmsg. In a basic oversimplified form the API from the user
perspective could look like:

struct meta_attr {
	u16 type;
	u64 data;
};

struct meta_attr attr[] = {{HINT, hint_value}, {INTEGRITY, ptr}};
sqe->meta_attrs = attr;
sqe->meta_nr = 2;

I'm pretty sure there will be a bunch of considerations people
will name the API should account for, like sizing the struct
so it can fit integrity bits or even making it variable sized.


> At this point one type META_TYPE_LIFETIME_HINT is supported.
> With this type, user can pass lifetime hint values in the new field
> 
> 	__u64 lifetime_val;
> 
> This accepts all lifetime hint values that are possible with
> F_SET_RW_HINT fcntl.
> 
> The write handlers (io_prep_rw, io_write) send the hint value to
> lower-layer using kiocb. This is good for upporting direct IO,
> but not when kiocb is not available (e.g., buffered IO).
> 
> When per-io hints are not passed, the per-inode hint values are set in
> the kiocb (as before). Otherwise, these take the precedence on per-inode
> hints.
-- 
Pavel Begunkov



More information about the Linux-nvme mailing list