[PATCH 1/2] iouring: one capable call per iouring instance
Pavel Begunkov
asml.silence at gmail.com
Mon Dec 4 10:45:51 PST 2023
On 12/4/23 18:05, Jens Axboe wrote:
> On 12/4/23 10:53 AM, Keith Busch wrote:
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 1d254f2c997de..4aa10b64f539e 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -3980,6 +3980,7 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
>> ctx->syscall_iopoll = 1;
>>
>> ctx->compat = in_compat_syscall();
>> + ctx->sys_admin = capable(CAP_SYS_ADMIN);
>> if (!ns_capable_noaudit(&init_user_ns, CAP_IPC_LOCK))
>> ctx->user = get_uid(current_user());
>
> Hmm, what happens if the app starts as eg root for initialization
> purposes and drops caps after? That would have previously have caused
> passthrough to fail, but now it will work. Perhaps this is fine, after
> all this isn't unusual for eg opening device or doing other init special
> work?
The side effects would be quite a surprise when you initialize the ring
from a privileged process and then pass it to a less capable one. Ring
sharing would also be affected. Privilege downgrade also sounds like
a valid concern. The first two will be solved if restricted to
IORING_SETUP_DEFER_TASKRUN rings and
io_is_capable() {
return ctx->sys_admin || capable();
}
And it still doesn't seem great bypassing it, when the question is
rather why it's expensive? I've seen before in the wild a fat BPF
program running on every call, is that what happens?
> In any case, that should definitely be explicitly mentioned in the
> commit message for a change like that.
>
--
Pavel Begunkov
More information about the Linux-nvme
mailing list