[PATCH v6 3/3] io_uring: enable per-io hinting capability
Hannes Reinecke
hare at suse.de
Tue Sep 24 22:57:31 PDT 2024
On 9/24/24 11:24, Kanchan Joshi wrote:
> With F_SET_RW_HINT fcntl, user can set a hint on the file inode, and
> all the subsequent writes on the file pass that hint value down.
> This can be limiting for large files (and for block device) as all the
> writes can be tagged with only one lifetime hint value.
> Concurrent writes (with different hint values) are hard to manage.
> Per-IO hinting solves that problem.
>
> Allow userspace to pass the write hint type and its value in the SQE.
> Two new fields are carved in the leftover space of SQE:
> __u8 hint_type;
> __u64 hint_val;
>
> Adding the hint_type helps in keeping the interface extensible for future
> use.
> At this point only one type TYPE_WRITE_LIFETIME_HINT is supported. With
> this type, user can pass the lifetime hint values that are currently
> supported by F_SET_RW_HINT fcntl.
>
> The write handlers (io_prep_rw, io_write) process the hint type/value
> and hint value is passed to lower-layer using kiocb. This is good for
> supporting direct IO, but not when kiocb is not available (e.g.,
> buffered IO).
>
> In general, per-io hints take the precedence on per-inode hints.
> Three cases to consider:
>
> Case 1: When hint_type is 0 (explicitly, or implicitly as SQE fields are
> initialized to 0), this means user did not send any hint. The per-inode
> hint values are set in the kiocb (as before).
>
> Case 2: When hint_type is TYPE_WRITE_LIFETIME_HINT, the hint_value is
> set into the kiocb after sanity checking.
>
> Case 3: When hint_type is anything else, this is flagged as an error
> and write is failed.
>
> Signed-off-by: Kanchan Joshi <joshi.k at samsung.com>
> Signed-off-by: Nitesh Shetty <nj.shetty at samsung.com>
> ---
> fs/fcntl.c | 22 ----------------------
> include/linux/rw_hint.h | 24 ++++++++++++++++++++++++
> include/uapi/linux/io_uring.h | 10 ++++++++++
> io_uring/rw.c | 21 ++++++++++++++++++++-
> 4 files changed, 54 insertions(+), 23 deletions(-)
>
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 081e5e3d89ea..2eb78035a350 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -334,28 +334,6 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
> }
> #endif
>
> -static bool rw_hint_valid(u64 hint)
> -{
> - BUILD_BUG_ON(WRITE_LIFE_NOT_SET != RWH_WRITE_LIFE_NOT_SET);
> - BUILD_BUG_ON(WRITE_LIFE_NONE != RWH_WRITE_LIFE_NONE);
> - BUILD_BUG_ON(WRITE_LIFE_SHORT != RWH_WRITE_LIFE_SHORT);
> - BUILD_BUG_ON(WRITE_LIFE_MEDIUM != RWH_WRITE_LIFE_MEDIUM);
> - BUILD_BUG_ON(WRITE_LIFE_LONG != RWH_WRITE_LIFE_LONG);
> - BUILD_BUG_ON(WRITE_LIFE_EXTREME != RWH_WRITE_LIFE_EXTREME);
> -
> - switch (hint) {
> - case RWH_WRITE_LIFE_NOT_SET:
> - case RWH_WRITE_LIFE_NONE:
> - case RWH_WRITE_LIFE_SHORT:
> - case RWH_WRITE_LIFE_MEDIUM:
> - case RWH_WRITE_LIFE_LONG:
> - case RWH_WRITE_LIFE_EXTREME:
> - return true;
> - default:
> - return false;
> - }
> -}
> -
> static long fcntl_get_rw_hint(struct file *file, unsigned int cmd,
> unsigned long arg)
> {
> diff --git a/include/linux/rw_hint.h b/include/linux/rw_hint.h
> index 309ca72f2dfb..f4373a71ffed 100644
> --- a/include/linux/rw_hint.h
> +++ b/include/linux/rw_hint.h
> @@ -21,4 +21,28 @@ enum rw_hint {
> static_assert(sizeof(enum rw_hint) == 1);
> #endif
>
> +#define WRITE_LIFE_INVALID (RWH_WRITE_LIFE_EXTREME + 1)
> +
> +static inline bool rw_hint_valid(u64 hint)
> +{
> + BUILD_BUG_ON(WRITE_LIFE_NOT_SET != RWH_WRITE_LIFE_NOT_SET);
> + BUILD_BUG_ON(WRITE_LIFE_NONE != RWH_WRITE_LIFE_NONE);
> + BUILD_BUG_ON(WRITE_LIFE_SHORT != RWH_WRITE_LIFE_SHORT);
> + BUILD_BUG_ON(WRITE_LIFE_MEDIUM != RWH_WRITE_LIFE_MEDIUM);
> + BUILD_BUG_ON(WRITE_LIFE_LONG != RWH_WRITE_LIFE_LONG);
> + BUILD_BUG_ON(WRITE_LIFE_EXTREME != RWH_WRITE_LIFE_EXTREME);
> +
> + switch (hint) {
> + case RWH_WRITE_LIFE_NOT_SET:
> + case RWH_WRITE_LIFE_NONE:
> + case RWH_WRITE_LIFE_SHORT:
> + case RWH_WRITE_LIFE_MEDIUM:
> + case RWH_WRITE_LIFE_LONG:
> + case RWH_WRITE_LIFE_EXTREME:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> #endif /* _LINUX_RW_HINT_H */
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 1fe79e750470..e21a74dd0c49 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -98,6 +98,11 @@ struct io_uring_sqe {
> __u64 addr3;
> __u64 __pad2[1];
> };
> + struct {
> + /* To send per-io hint type/value with write command */
> + __u64 hint_val;
> + __u8 hint_type;
> + };
Why is 'hint_val' 64 bits? Everything else is 8 bytes, so wouldn't it
be better to shorten that? As it stands the new struct will introduce
a hole of 24 bytes after 'hint_type'.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list