[PATCH 15/18] nvmet-tcp: enable TLS handshake upcall

Thu Mar 23 00:21:02 PDT 2023

>>>>>> The 'data_ready' call might happen at any time after the 'accept' call and us calling into userspace.
>>>>>> In particular we have this flow of control:
>>>>>>
>>>>>> 1. Kernel: accept()
>>>>>> 2. Kernel: handshake request
>>>>>> 3. Userspace: read data from socket
>>>>>> 4. Userspace: tls handshake
>>>>>> 5. Kernel: handshake complete
>>>>>>
>>>>>> If the 'data_ready' event occurs between 1. and 3. userspace wouldn't know that something has happened, and will be sitting there waiting for data which is already present.
>>>>>
>>>>> Umm, doesn't userspace read from the socket once we trigger the upcall?
>>>>> it should. But I still don't understand what is the difference between
>>>>> us waiking up userspace, from the default sock doing the same?
>>>>>
>>>> No, it doesn't (or, rather, can't).
>>>> After processing 'accept()' (from the kernel code) data might already be present (after all, why would we get an 'accept' call otherwise?).
>>>> But the daemon has not been started up (yet); that's only done in
>>>> step 3). But 'data_ready' has already been called, so by the time userland is able to do a 'read()' on the socket it won't be seeing anything.
>>> Not sure I understand. if data exists, userspace will read from the
>>> socket and get data, whenever that is. >
>> That's what I thought, too.
>> But then the userspace daemon just sat there doing nothing.
> 
> I haven't been following this discussion in detail, but
> if the kernel disables the normal TCP data_ready callback,
> then user space won't get any data. That's why SunRPC's
> data_ready calls the previous sk_data_ready and then shunts
> its own data_ready callback during handshakes. Without that
> call to the old sk_data_ready, the user space endpoint won't
> see any received data.

Yes that is understood. But the solution that Hannes proposed
was to introduce nvmet_tcp_tls_data_ready which is overriding
the default sock data_ready and does pretty much the same thing.

The reason is that today nvmet_tcp_listen_data_ready schedules accept
and then pretty much immediately replaces the socket data_ready to
nvmet_tcp_data_ready.

I think that a simpler solution was to make nvmet_tcp_listen_data_ready
call port->data_ready (default socket stored data_ready), schedule
the accept_work and only after the handshake bounce to userspace is
completed, override the socket callbacks.

Something like:
--
static void nvmet_tcp_listen_data_ready(struct sock *sk)
{
         struct nvmet_tcp_port *port;

         trace_sk_data_ready(sk);

         read_lock_bh(&sk->sk_callback_lock);
         port = sk->sk_user_data;
         if (!port)
                 goto out;

         port->data_ready(sk); // trigger socket old data_ready

         if (sk->sk_state == TCP_LISTEN)
                 queue_work(nvmet_wq, &port->accept_work);
out:
         read_unlock_bh(&sk->sk_callback_lock);
}

...

static void nvmet_tcp_tls_handshake_done(void *data, int status,
					 key_serial_t peerid)
{
	...
	queue->state = NVMET_TCP_Q_CONNECTING;
	nvmet_tcp_set_queue_sock(queue); // now own socket data_ready
}
--