[PATCHv11 00/18] nvme: In-kernel TLS support for TCP

Mon Sep 18 05:15:26 PDT 2023

On 9/14/23 20:10, Sagi Grimberg wrote:
> 
> 
> On 9/11/23 15:09, Hannes Reinecke wrote:
>> On 9/6/23 12:04, Hannes Reinecke wrote:
>>> On 9/6/23 10:48, Pawel Baldysiak wrote:
>>>> Hi Hannes,
>>>>
>>>> I'm testing v15 patchset in extensive way and observing that it is 
>>>> not stable.
>>>> I'm running it on two connected machines - one is used as a host, 
>>>> second as target.
>>>> Once connected I'm running I/Os with dd.
>>>> Eventually I got I/O timeouts from nvme layer on host,
>>>> causing connect/disconnect loops and I/O hung.
>>>> Does not matter if the workload is reads or writes.
>>>>
>>> Do you have the kernel message log here?
>>> It _might_ be that we're overloading the TLS stack, such that 
>>> processing of individual packets is taking too long and triggering a 
>>> KATO.
>>>
>>>> Second test is connect/disconnect loops.
>>>> On disconnect I've got ocasional delays due to nvme commands timeouts.
>>>> The disconnect happens eventually, but first it need to hit timeout.
>>>>
>>> Again, do you have message logs?
>>>
>> Managed to validate it with a larger setup.
>> Message log is:
>> nvme nvme0: mapped 16/0/0 default/read/poll queues.
>> nvme nvme0: new ctrl: NQN "nqn.exuma", addr 10.168.136.124:4420
>> nvme nvme0: Removing ctrl: NQN "nqn.exuma"
>> nvme nvme0: queue 1: timeout cid 0x3071 type 4 opcode 0x2 (Read)
>> nvme0c0n1: Read(0x2) @ LBA 15999927, 9 blocks, Host Aborted Command 
>> (sct 0x3 / sc 0x71)
>> nvme nvme0: queue 1: timeout cid 0x2078 type 4 opcode 0x2 (Read)
>> nvme0c0n1: Read(0x2) @ LBA 15999872, 4 blocks, Host Aborted Command 
>> (sct 0x3 / sc 0x71)
>> nvme nvme0: queue 1: timeout cid 0x2079 type 4 opcode 0x2 (Read)
>> nvme0c0n1: Read(0x2) @ LBA 15999877, 1 blocks, Host Aborted Command 
>> (sct 0x3 / sc 0x71)
>> nvme nvme0: queue 1: timeout cid 0x207a type 4 opcode 0x2 (Read)
>> nvme0c0n1: Read(0x2) @ LBA 15999879, 2 blocks, Host Aborted Command 
>> (sct 0x3 / sc 0x71)
>> nvme nvme0: queue 1: timeout cid 0x207b type 4 opcode 0x2 (Read)
>> nvme0c0n1: Read(0x2) @ LBA 15999882, 4 blocks, Host Aborted Command 
>> (sct 0x3 / sc 0x71)
>> nvme nvme0: queue 1: timeout cid 0x207c type 4 opcode 0x2 (Read)
>> nvme0c0n1: Read(0x2) @ LBA 15999887, 21 blocks, Host Aborted Command 
>> (sct 0x3 / sc 0x71)
>> nvme nvme0: queue 1: timeout cid 0x207d type 4 opcode 0x2 (Read)
>> nvme0c0n1: Read(0x2) @ LBA 15999909, 6 blocks, Host Aborted Command 
>> (sct 0x3 / sc 0x71)
>> nvme nvme0: queue 1: timeout cid 0x207e type 4 opcode 0x2 (Read)
>> nvme0c0n1: Read(0x2) @ LBA 15999916, 10 blocks, Host Aborted Command 
>> (sct 0x3 / sc 0x71)
>> block nvme0n1: no available path - failing I/O
>> Buffer I/O error on dev nvme0n1, logical block 15999872, async page read
>> nvme nvme0: failed to send request -32
>> nvme nvme0: Property Set error: 880, offset 0x14
> 
> Can you add timestamps to the log trace?
> 
>>
>> Which actually points to an issue in the teardown flow.
>> nvme_do_delete_ctrl() does this:
>>
>>         nvme_stop_ctrl(ctrl);
>>         nvme_remove_namespaces(ctrl);
>>         ctrl->ops->delete_ctrl(ctrl);
>>
>> and ->delete_ctrl()
>>   ->nvme_tcp_delete_ctrl()
>>     ->nvme_tcp_teardown_ctrl()
>>       ->nvme_tcp_teardown_io_queues()
>>
>> has:
>>         nvme_quiesce_admin_queue(ctrl);
>>         nvme_quiesce_io_queues(ctrl);
>>         nvme_sync_io_queues(ctrl);
>>         nvme_tcp_stop_io_queues(ctrl);
>>
>> Now, if you look at 'nvme_sync_io_queues()' you'll find
>>
>>         down_read(&ctrl->namespaces_rwsem);
>>         list_for_each_entry(ns, &ctrl->namespaces, list)
>>                 blk_sync_queue(ns->queue);
>>         up_read(&ctrl->namespaces_rwsem);
>>
>> but we've removed all namespaces right at the start in
>> nvme_do_delete_ctr(), so this is a no-op and we'll never
>> catch any I/O in flight causing the timeouts to occur :-(
> 
> Hannes, its not clear to me what you're saying here. What is different
> in TLS that makes the teardown go that slow?
> 
> Also, what is being blocked here? Are there still I/Os to an already
> deleted namespace?

Hmm. Seems to be an interrelated issue. Problem is that TLS sessions 
occasionally time out right after initiating the connection, and it'll
take some reconnect attempts to get things sorts.
Still debugging; seems like an issue with the network stack.
And while reconnection is underway of course the I/O is aborted, but
depending on the timing namespace scanning might be underway, so we're
blocked in 'flush_work(&ctrl->scan_work)' in nvme_remove_namespaces()
and have to wait for command timeout on namespace identify et al.
So that is probably expected, and I'll be concentrating on figuring out
why we're seeing reconnections.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman