[PATCH 1/2] iouring: one capable call per iouring instance

Pavel Begunkov asml.silence at gmail.com
Mon Dec 4 10:45:51 PST 2023


On 12/4/23 18:05, Jens Axboe wrote:
> On 12/4/23 10:53 AM, Keith Busch wrote:
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 1d254f2c997de..4aa10b64f539e 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -3980,6 +3980,7 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
>>   		ctx->syscall_iopoll = 1;
>>   
>>   	ctx->compat = in_compat_syscall();
>> +	ctx->sys_admin = capable(CAP_SYS_ADMIN);
>>   	if (!ns_capable_noaudit(&init_user_ns, CAP_IPC_LOCK))
>>   		ctx->user = get_uid(current_user());
> 
> Hmm, what happens if the app starts as eg root for initialization
> purposes and drops caps after? That would have previously have caused
> passthrough to fail, but now it will work. Perhaps this is fine, after
> all this isn't unusual for eg opening device or doing other init special
> work?

The side effects would be quite a surprise when you initialize the ring
from a privileged process and then pass it to a less capable one. Ring
sharing would also be affected. Privilege downgrade also sounds like
a valid concern. The first two will be solved if restricted to
IORING_SETUP_DEFER_TASKRUN rings and

io_is_capable() {
	return ctx->sys_admin || capable();
}

And it still doesn't seem great bypassing it, when the question is
rather why it's expensive? I've seen before in the wild a fat BPF
program running on every call, is that what happens?

> In any case, that should definitely be explicitly mentioned in the
> commit message for a change like that.
> 

-- 
Pavel Begunkov



More information about the Linux-nvme mailing list