[PATCH 13/14] nvmet-tcp: enable TLS handshake upcall

Mon Aug 7 04:49:05 PDT 2023

>>> @@ -389,6 +427,16 @@ static ssize_t nvmet_addr_tsas_store(struct 
>>> config_item *item,
>>>   found:
>>>       nvmet_port_init_tsas_tcp(port, nvmet_addr_tsas_tcp[i].type);
>>> +    if (nvmet_addr_tsas_tcp[i].type == NVMF_TCP_SECTYPE_TLS13) {
>>> +        if (nvmet_port_treq(port) == NVMF_TREQ_NOT_SPECIFIED)
>>> +            treq |= NVMF_TREQ_REQUIRED;
>>> +        else
>>> +            treq |= nvmet_port_treq(port);
>>> +    } else {
>>> +        /* Set to 'not specified' if TLS is not enabled */
>>> +        treq |= NVMF_TREQ_NOT_SPECIFIED;
>>> +    }
>>> +    port->disc_addr.treq = treq;
>>>       return count;
>>>   }
>>
>> Can the treq/tsas be split from the actual nvmet-tcp upcall addition?
>>
> I guess it can; it's just that it doesn't make any sense if we don't 
> have the TLS upcall implemented.

Still its mixing two parts together and makes the patch unnecessarily
complicated.

>>> @@ -1285,12 +1302,12 @@ static int nvmet_tcp_try_recv(struct 
>>> nvmet_tcp_queue *queue,
>>>   static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue 
>>> *queue)
>>>   {
>>> -    spin_lock(&queue->state_lock);
>>> +    spin_lock_irq(&queue->state_lock);
>>
>> Where is this lock taken in irq context that needs disabling irq?
>>
> Let me check; might be that it got solved with the workqueue fixes which 
> went in lately.

Can you explain what was the original issue for this change?

>>> +{
>>> +    struct nvmet_tcp_queue *queue = data;
>>> +
>>> +    pr_debug("queue %d: TLS handshake done, key %x, status %d\n",
>>> +         queue->idx, peerid, status);
>>> +    if (!status) {
>>> +        spin_lock_irq(&queue->state_lock);
>>> +        queue->tls_pskid = peerid;
>>> +        spin_unlock_irq(&queue->state_lock);
>>> +    }
>>> +    cancel_delayed_work_sync(&queue->tls_handshake_work);
>>
>> Hmm, the cancel_delayed_work_sync is scary.
>>
>> What happens if it ran and already scheduled a release (which
>> already ran and completed)?
>>
> Well, we need to stop the timeout at some point; granted, for a failure 
> we can just flush the workqueue, but in the success case we need 
> terminate the workqueue.
> And nvmet_tcp_schedule_release_queue() already checks for the queue 
> state, so we should be (reasonably) safe.

Still it looks error prone (although granted, I didn't spot a particular
race where this breaks).

I wish the handshake code could have given us a timeout functionality
so we don't need to deal with it...

In any event, preferably the synchronization does not require to
synchronize an active work element on the workqueue.

>>> +    pr_debug("queue %d: TLS ServerHello\n", queue->idx);
>>> +    memset(&args, 0, sizeof(args));
>>> +    args.ta_sock = queue->sock;
>>> +    args.ta_done = nvmet_tcp_tls_handshake_done;
>>> +    args.ta_data = queue;
>>> +    args.ta_keyring = key_serial(queue->port->nport->keyring);
>>> +    args.ta_timeout_ms = tls_handshake_timeout * 2 * 1024;
>>
>> * 2 * 1024 ? I didn't know we have 2048 ms in a second...
>>
> The '_ms' bit is just an indicator of the unit, not the value.
> We don't have an 'args.ta_timeout_sec' ...

I expected to see tls_handshake_timeout * 1000, i.e. a trivial
conversion from sec to ms.