[PATCH 1/2] iouring: one capable call per iouring instance
Jeff Moyer
jmoyer at redhat.com
Mon Dec 4 10:40:58 PST 2023
I added a CC: linux-security-module at vger
Hi, Keith,
Keith Busch <kbusch at meta.com> writes:
> From: Keith Busch <kbusch at kernel.org>
>
> The uring_cmd operation is often used for privileged actions, so drivers
> subscribing to this interface check capable() for each command. The
> capable() function is not fast path friendly for many kernel configs,
> and this can really harm performance. Stash the capable sys admin
> attribute in the io_uring context and set a new issue_flag for the
> uring_cmd interface.
I have a few questions. What privileged actions are performance
sensitive? I would hope that anything requiring privileges would not be
in a fast path (but clearly that's not the case). What performance
benefits did you measure with this patch set in place (and on what
workloads)? What happens when a ring fd is passed to another process?
Finally, as Jens mentioned, I would expect dropping priviliges to, you
know, drop privileges. I don't think a commit message is going to be
enough documentation for a change like this.
Cheers,
Jeff
>
> Signed-off-by: Keith Busch <kbusch at kernel.org>
> ---
> include/linux/io_uring_types.h | 4 ++++
> io_uring/io_uring.c | 1 +
> io_uring/uring_cmd.c | 2 ++
> 3 files changed, 7 insertions(+)
>
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index bebab36abce89..d64d6916753f0 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -36,6 +36,9 @@ enum io_uring_cmd_flags {
> /* set when uring wants to cancel a previously issued command */
> IO_URING_F_CANCEL = (1 << 11),
> IO_URING_F_COMPAT = (1 << 12),
> +
> + /* ring validated as CAP_SYS_ADMIN capable */
> + IO_URING_F_SYS_ADMIN = (1 << 13),
> };
>
> struct io_wq_work_node {
> @@ -240,6 +243,7 @@ struct io_ring_ctx {
> unsigned int poll_activated: 1;
> unsigned int drain_disabled: 1;
> unsigned int compat: 1;
> + unsigned int sys_admin: 1;
>
> struct task_struct *submitter_task;
> struct io_rings *rings;
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 1d254f2c997de..4aa10b64f539e 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3980,6 +3980,7 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
> ctx->syscall_iopoll = 1;
>
> ctx->compat = in_compat_syscall();
> + ctx->sys_admin = capable(CAP_SYS_ADMIN);
> if (!ns_capable_noaudit(&init_user_ns, CAP_IPC_LOCK))
> ctx->user = get_uid(current_user());
>
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index 8a38b9f75d841..764f0e004aa00 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -164,6 +164,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
> issue_flags |= IO_URING_F_CQE32;
> if (ctx->compat)
> issue_flags |= IO_URING_F_COMPAT;
> + if (ctx->sys_admin)
> + issue_flags |= IO_URING_F_SYS_ADMIN;
> if (ctx->flags & IORING_SETUP_IOPOLL) {
> if (!file->f_op->uring_cmd_iopoll)
> return -EOPNOTSUPP;
More information about the Linux-nvme
mailing list