[PATCH 15/18] nvmet-tcp: enable TLS handshake upcall

Sun Mar 26 00:18:40 PDT 2023

>>>>>>>> The 'data_ready' call might happen at any time after the 
>>>>>>>> 'accept' call and us calling into userspace.
>>>>>>>> In particular we have this flow of control:
>>>>>>>>
>>>>>>>> 1. Kernel: accept()
>>>>>>>> 2. Kernel: handshake request
>>>>>>>> 3. Userspace: read data from socket
>>>>>>>> 4. Userspace: tls handshake
>>>>>>>> 5. Kernel: handshake complete
>>>>>>>>
>>>>>>>> If the 'data_ready' event occurs between 1. and 3. userspace 
>>>>>>>> wouldn't know that something has happened, and will be sitting 
>>>>>>>> there waiting for data which is already present.
>>>>>>>
>>>>>>> Umm, doesn't userspace read from the socket once we trigger the 
>>>>>>> upcall?
>>>>>>> it should. But I still don't understand what is the difference 
>>>>>>> between
>>>>>>> us waiking up userspace, from the default sock doing the same?
>>>>>>>
>>>>>> No, it doesn't (or, rather, can't).
>>>>>> After processing 'accept()' (from the kernel code) data might 
>>>>>> already be present (after all, why would we get an 'accept' call 
>>>>>> otherwise?).
>>>>>> But the daemon has not been started up (yet); that's only done in
>>>>>> step 3). But 'data_ready' has already been called, so by the time 
>>>>>> userland is able to do a 'read()' on the socket it won't be seeing 
>>>>>> anything.
>>>>> Not sure I understand. if data exists, userspace will read from the
>>>>> socket and get data, whenever that is. >
>>>> That's what I thought, too.
>>>> But then the userspace daemon just sat there doing nothing.
>>>
>>> I haven't been following this discussion in detail, but
>>> if the kernel disables the normal TCP data_ready callback,
>>> then user space won't get any data. That's why SunRPC's
>>> data_ready calls the previous sk_data_ready and then shunts
>>> its own data_ready callback during handshakes. Without that
>>> call to the old sk_data_ready, the user space endpoint won't
>>> see any received data.
>>
>> Yes that is understood. But the solution that Hannes proposed
>> was to introduce nvmet_tcp_tls_data_ready which is overriding
>> the default sock data_ready and does pretty much the same thing.
>>
>> The reason is that today nvmet_tcp_listen_data_ready schedules accept
>> and then pretty much immediately replaces the socket data_ready to
>> nvmet_tcp_data_ready.
>>
>> I think that a simpler solution was to make nvmet_tcp_listen_data_ready
>> call port->data_ready (default socket stored data_ready), schedule
>> the accept_work and only after the handshake bounce to userspace is
>> completed, override the socket callbacks.
>>
>> Something like:
>> -- 
>> static void nvmet_tcp_listen_data_ready(struct sock *sk)
>> {
>>          struct nvmet_tcp_port *port;
>>
>>          trace_sk_data_ready(sk);
>>
>>          read_lock_bh(&sk->sk_callback_lock);
>>          port = sk->sk_user_data;
>>          if (!port)
>>                  goto out;
>>
>>          port->data_ready(sk); // trigger socket old data_ready
>>
>>          if (sk->sk_state == TCP_LISTEN)
>>                  queue_work(nvmet_wq, &port->accept_work);
>> out:
>>          read_unlock_bh(&sk->sk_callback_lock);
>> }
>>
> 
> Nearly there.
> 
> The actual patch would be:
> 
> @@ -2031,10 +1988,16 @@ static void nvmet_tcp_listen_data_ready(struct 
> sock *sk)
>          trace_sk_data_ready(sk);
> 
>          read_lock_bh(&sk->sk_callback_lock);
> +       /* Ignore if the callback has been changed */
> +       if (sk->sk_data_ready != nvmet_tcp_listen_data_ready)
> +               goto out;
>          port = sk->sk_user_data;
>          if (!port)
>                  goto out;
> 
> +       if (port->data_ready)
> +               port->data_ready(sk);
> +
>          if (sk->sk_state == TCP_LISTEN)
>                  queue_work(nvmet_wq, &port->accept_work);
>   out:
> 
> As the callbacks will be changed once TLS is activated, and we really 
> should not attempt to run if sk_data_ready() points to another function,
> as then the sk_user_data pointer will most likely be changed, too,
> causing all sorts of issues.

Umm, something is unclear to me. if nvmet_tcp_listen_data_ready is
called doesn't it by definition mean that sk->sk_data_ready ==
nvmet_tcp_listen_data_ready ?

Are you talking about a situation where between
nvmet_tcp_listen_data_ready is starting and until the
sk->sk_callback_lock the data_ready cb (and the user data
pointer) is changed?