[RFC PATCH 0/4] net/io_uring: pass a kernel pointer via optlen_t to proto[_ops].getsockopt()
Stefan Metzmacher
metze at samba.org
Tue Apr 1 06:48:58 PDT 2025
Am 01.04.25 um 15:37 schrieb Stefan Metzmacher:
> Am 01.04.25 um 10:19 schrieb Stefan Metzmacher:
>> Am 31.03.25 um 23:04 schrieb Stanislav Fomichev:
>>> On 03/31, Stefan Metzmacher wrote:
>>>> The motivation for this is to remove the SOL_SOCKET limitation
>>>> from io_uring_cmd_getsockopt().
>>>>
>>>> The reason for this limitation is that io_uring_cmd_getsockopt()
>>>> passes a kernel pointer as optlen to do_sock_getsockopt()
>>>> and can't reach the ops->getsockopt() path.
>>>>
>>>> The first idea would be to change the optval and optlen arguments
>>>> to the protocol specific hooks also to sockptr_t, as that
>>>> is already used for setsockopt() and also by do_sock_getsockopt()
>>>> sk_getsockopt() and BPF_CGROUP_RUN_PROG_GETSOCKOPT().
>>>>
>>>> But as Linus don't like 'sockptr_t' I used a different approach.
>>>>
>>>> @Linus, would that optlen_t approach fit better for you?
>>>
>>> [..]
>>>
>>>> Instead of passing the optlen as user or kernel pointer,
>>>> we only ever pass a kernel pointer and do the
>>>> translation from/to userspace in do_sock_getsockopt().
>>>
>>> At this point why not just fully embrace iov_iter? You have the size
>>> now + the user (or kernel) pointer. Might as well do
>>> s/sockptr_t/iov_iter/ conversion?
>>
>> I think that would only be possible if we introduce
>> proto[_ops].getsockopt_iter() and then convert the implementations
>> step by step. Doing it all in one go has a lot of potential to break
>> the uapi. I could try to convert things like socket, ip and tcp myself, but
>> the rest needs to be converted by the maintainer of the specific protocol,
>> as it needs to be tested. As there are crazy things happening in the existing
>> implementations, e.g. some getsockopt() implementations use optval as in and out
>> buffer.
>>
>> I first tried to convert both optval and optlen of getsockopt to sockptr_t,
>> and that showed that touching the optval part starts to get complex very soon,
>> see https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=141912166473bf8843ec6ace76dc9c6945adafd1
>> (note it didn't converted everything, I gave up after hitting
>> sctp_getsockopt_peer_addrs and sctp_getsockopt_local_addrs.
>> sctp_getsockopt_context, sctp_getsockopt_maxseg, sctp_getsockopt_associnfo and maybe
>> more are the ones also doing both copy_from_user and copy_to_user on optval)
>>
>> I come also across one implementation that returned -ERANGE because *optlen was
>> too short and put the required length into *optlen, which means the returned
>> *optlen is larger than the optval buffer given from userspace.
>>
>> Because of all these strange things I tried to do a minimal change
>> in order to get rid of the io_uring limitation and only converted
>> optlen and leave optval as is.
>>
>> In order to have a patchset that has a low risk to cause regressions.
>>
>> But as alternative introducing a prototype like this:
>>
>> int (*getsockopt_iter)(struct socket *sock, int level, int optname,
>> struct iov_iter *optval_iter);
>>
>> That returns a non-negative value which can be placed into *optlen
>> or negative value as error and *optlen will not be changed on error.
>> optval_iter will get direction ITER_DEST, so it can only be written to.
>>
>> Implementations could then opt in for the new interface and
>> allow do_sock_getsockopt() work also for the io_uring case,
>> while all others would still get -EOPNOTSUPP.
>>
>> So what should be the way to go?
>
> Ok, I've added the infrastructure for getsockopt_iter, see below,
> but the first part I wanted to convert was
> tcp_ao_copy_mkts_to_user() and that also reads from userspace before
> writing.
>
> So we could go with the optlen_t approach, or we need
> logic for ITER_BOTH or pass two iov_iters one with ITER_SRC and one
> with ITER_DEST...
>
> So who wants to decide?
I just noticed that it's even possible in same cases
to pass in a short buffer to optval, but have a longer value in optlen,
hci_sock_getsockopt() with SOL_BLUETOOTH completely ignores optlen.
This makes it really hard to believe that trying to use iov_iter for this
is a good idea :-(
Any ideas beside just going with optlen_t?
metze
More information about the linux-afs
mailing list