[PATCH 13/14] nvmet-tcp: enable TLS handshake upcall
Sagi Grimberg
sagi at grimberg.me
Mon Aug 7 04:49:05 PDT 2023
>>> @@ -389,6 +427,16 @@ static ssize_t nvmet_addr_tsas_store(struct
>>> config_item *item,
>>> found:
>>> nvmet_port_init_tsas_tcp(port, nvmet_addr_tsas_tcp[i].type);
>>> + if (nvmet_addr_tsas_tcp[i].type == NVMF_TCP_SECTYPE_TLS13) {
>>> + if (nvmet_port_treq(port) == NVMF_TREQ_NOT_SPECIFIED)
>>> + treq |= NVMF_TREQ_REQUIRED;
>>> + else
>>> + treq |= nvmet_port_treq(port);
>>> + } else {
>>> + /* Set to 'not specified' if TLS is not enabled */
>>> + treq |= NVMF_TREQ_NOT_SPECIFIED;
>>> + }
>>> + port->disc_addr.treq = treq;
>>> return count;
>>> }
>>
>> Can the treq/tsas be split from the actual nvmet-tcp upcall addition?
>>
> I guess it can; it's just that it doesn't make any sense if we don't
> have the TLS upcall implemented.
Still its mixing two parts together and makes the patch unnecessarily
complicated.
>>> @@ -1285,12 +1302,12 @@ static int nvmet_tcp_try_recv(struct
>>> nvmet_tcp_queue *queue,
>>> static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue
>>> *queue)
>>> {
>>> - spin_lock(&queue->state_lock);
>>> + spin_lock_irq(&queue->state_lock);
>>
>> Where is this lock taken in irq context that needs disabling irq?
>>
> Let me check; might be that it got solved with the workqueue fixes which
> went in lately.
Can you explain what was the original issue for this change?
>>> +{
>>> + struct nvmet_tcp_queue *queue = data;
>>> +
>>> + pr_debug("queue %d: TLS handshake done, key %x, status %d\n",
>>> + queue->idx, peerid, status);
>>> + if (!status) {
>>> + spin_lock_irq(&queue->state_lock);
>>> + queue->tls_pskid = peerid;
>>> + spin_unlock_irq(&queue->state_lock);
>>> + }
>>> + cancel_delayed_work_sync(&queue->tls_handshake_work);
>>
>> Hmm, the cancel_delayed_work_sync is scary.
>>
>> What happens if it ran and already scheduled a release (which
>> already ran and completed)?
>>
> Well, we need to stop the timeout at some point; granted, for a failure
> we can just flush the workqueue, but in the success case we need
> terminate the workqueue.
> And nvmet_tcp_schedule_release_queue() already checks for the queue
> state, so we should be (reasonably) safe.
Still it looks error prone (although granted, I didn't spot a particular
race where this breaks).
I wish the handshake code could have given us a timeout functionality
so we don't need to deal with it...
In any event, preferably the synchronization does not require to
synchronize an active work element on the workqueue.
>>> + pr_debug("queue %d: TLS ServerHello\n", queue->idx);
>>> + memset(&args, 0, sizeof(args));
>>> + args.ta_sock = queue->sock;
>>> + args.ta_done = nvmet_tcp_tls_handshake_done;
>>> + args.ta_data = queue;
>>> + args.ta_keyring = key_serial(queue->port->nport->keyring);
>>> + args.ta_timeout_ms = tls_handshake_timeout * 2 * 1024;
>>
>> * 2 * 1024 ? I didn't know we have 2048 ms in a second...
>>
> The '_ms' bit is just an indicator of the unit, not the value.
> We don't have an 'args.ta_timeout_sec' ...
I expected to see tls_handshake_timeout * 1000, i.e. a trivial
conversion from sec to ms.
More information about the Linux-nvme
mailing list