[PATCH 15/18] nvmet-tcp: enable TLS handshake upcall

Hannes Reinecke hare at suse.de
Fri Mar 24 04:29:23 PDT 2023


On 3/23/23 08:21, Sagi Grimberg wrote:
> 
>>>>>>> The 'data_ready' call might happen at any time after the 'accept' 
>>>>>>> call and us calling into userspace.
>>>>>>> In particular we have this flow of control:
>>>>>>>
>>>>>>> 1. Kernel: accept()
>>>>>>> 2. Kernel: handshake request
>>>>>>> 3. Userspace: read data from socket
>>>>>>> 4. Userspace: tls handshake
>>>>>>> 5. Kernel: handshake complete
>>>>>>>
>>>>>>> If the 'data_ready' event occurs between 1. and 3. userspace 
>>>>>>> wouldn't know that something has happened, and will be sitting 
>>>>>>> there waiting for data which is already present.
>>>>>>
>>>>>> Umm, doesn't userspace read from the socket once we trigger the 
>>>>>> upcall?
>>>>>> it should. But I still don't understand what is the difference 
>>>>>> between
>>>>>> us waiking up userspace, from the default sock doing the same?
>>>>>>
>>>>> No, it doesn't (or, rather, can't).
>>>>> After processing 'accept()' (from the kernel code) data might 
>>>>> already be present (after all, why would we get an 'accept' call 
>>>>> otherwise?).
>>>>> But the daemon has not been started up (yet); that's only done in
>>>>> step 3). But 'data_ready' has already been called, so by the time 
>>>>> userland is able to do a 'read()' on the socket it won't be seeing 
>>>>> anything.
>>>> Not sure I understand. if data exists, userspace will read from the
>>>> socket and get data, whenever that is. >
>>> That's what I thought, too.
>>> But then the userspace daemon just sat there doing nothing.
>>
>> I haven't been following this discussion in detail, but
>> if the kernel disables the normal TCP data_ready callback,
>> then user space won't get any data. That's why SunRPC's
>> data_ready calls the previous sk_data_ready and then shunts
>> its own data_ready callback during handshakes. Without that
>> call to the old sk_data_ready, the user space endpoint won't
>> see any received data.
> 
> Yes that is understood. But the solution that Hannes proposed
> was to introduce nvmet_tcp_tls_data_ready which is overriding
> the default sock data_ready and does pretty much the same thing.
> 
> The reason is that today nvmet_tcp_listen_data_ready schedules accept
> and then pretty much immediately replaces the socket data_ready to
> nvmet_tcp_data_ready.
> 
> I think that a simpler solution was to make nvmet_tcp_listen_data_ready
> call port->data_ready (default socket stored data_ready), schedule
> the accept_work and only after the handshake bounce to userspace is
> completed, override the socket callbacks.
> 
> Something like:
> -- 
> static void nvmet_tcp_listen_data_ready(struct sock *sk)
> {
>          struct nvmet_tcp_port *port;
> 
>          trace_sk_data_ready(sk);
> 
>          read_lock_bh(&sk->sk_callback_lock);
>          port = sk->sk_user_data;
>          if (!port)
>                  goto out;
> 
>          port->data_ready(sk); // trigger socket old data_ready
> 
>          if (sk->sk_state == TCP_LISTEN)
>                  queue_work(nvmet_wq, &port->accept_work);
> out:
>          read_unlock_bh(&sk->sk_callback_lock);
> }
> 

Nearly there.

The actual patch would be:

@@ -2031,10 +1988,16 @@ static void nvmet_tcp_listen_data_ready(struct 
sock *sk)
         trace_sk_data_ready(sk);

         read_lock_bh(&sk->sk_callback_lock);
+       /* Ignore if the callback has been changed */
+       if (sk->sk_data_ready != nvmet_tcp_listen_data_ready)
+               goto out;
         port = sk->sk_user_data;
         if (!port)
                 goto out;

+       if (port->data_ready)
+               port->data_ready(sk);
+
         if (sk->sk_state == TCP_LISTEN)
                 queue_work(nvmet_wq, &port->accept_work);
  out:

As the callbacks will be changed once TLS is activated, and we really 
should not attempt to run if sk_data_ready() points to another function,
as then the sk_user_data pointer will most likely be changed, too,
causing all sorts of issues.

Cheers,

Hannes




More information about the Linux-nvme mailing list