[PATCHv17 00/11] nvme: In-band authentication support

Sun Jun 26 06:01:09 PDT 2022

>>> Hi all,
>>>
>>> recent updates to the NVMe spec have added definitions for in-band
>>> authentication, and seeing that it provides some real benefit
>>> especially for NVMe-TCP here's an attempt to implement it.
>>>
>>> Thanks to Nicolai Stange the crypto DH framework has been upgraded
>>> to provide us with a FFDHE implementation; I've updated the patchset
>>> to use the ephemeral key generation provided there.
>>>
>>> Note that this is just for in-band authentication. Secure
>>> concatenation (ie starting TLS with the negotiated parameters)
>>> requires a TLS handshake, which the in-kernel TLS implementation
>>> does not provide. This is being worked on with a different patchset
>>> which is still WIP.
>>>
>>> The nvme-cli support has already been merged; please use the latest
>>> nvme-cli git repository to build the most recent version.
>>>
>>> A copy of this patchset can be found at
>>> git://git.kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel
>>> branch auth.v17
>>>
>>> The patchset is being cut against nvme-5.20.
>>>
>>> As usual, comments and reviews are welcome.
>>
>>
>> This looks better Hannes.
>>
>> Few questions:
>> 1. When we set dhgroups, should we try loading dh-generic (host and
>> target)? I forgot to load these and took me time to figure out why
>> things are not working?
>>
> Ah. Hmm. Yeah, guess we should (with the current ode, anyway).
> But maybe it's possible to make the code more resilient and simply 
> disallow DH groups if the module isn't loaded.
> I'll check.

Either an explanatory message or a simple request_module(), either
is fine...

> 
>> 2. Sometimes when I setup a misconfigured authentication, I don't
>> fail immediately, but block until an admin timeout expires.
>> For example, when I didn't load dh-generic on the host:
>> -- 
>> [ 1618.030365] nvme nvme0: new ctrl: NQN 
>> "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.123.1:8009
>> [ 1618.121852] nvme nvme1: qid 0: error -2 initializing DH group 
>> ffdhe4096
>> [ 1618.123944] nvme nvme1: qid 0: authenticated
>> [ 1680.738012] nvme nvme1: queue 0: timeout request 0x1 type 4
>> [ 1680.738155] nvme nvme1: Property Get error: 881, offset 0x0
>> [ 1680.738165] nvme nvme1: Reading CAP failed (881)
>> -- 
>>
> Curious.
> Does this help?
> 
> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
> index af3a7845ee76..285a9d6fbf1b 100644
> --- a/drivers/nvme/host/auth.c
> +++ b/drivers/nvme/host/auth.c
> @@ -740,6 +740,7 @@ static void __nvme_auth_work(struct work_struct *work)
>          ret = nvme_auth_process_dhchap_challenge(ctrl, chap);
>          if (ret) {
>                  /* Invalid challenge parameters */
> +               chap->error = ret;
>                  goto fail2;
>          }

Yes, that works.

> 
> 
>> 3. Not sure if this is related, but now I see a new memory leak:
>> -- 
>> unreferenced object 0xffff8fa0c91d2080 (size 128):
>>    comm "kworker/u8:4", pid 262, jiffies 4294949965 (age 1855.437s)
>>    hex dump (first 32 bytes):
>>      e0 a2 c9 96 ff ff ff ff 90 0f dd f4 a0 8f ff ff  ................
>>      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>    backtrace:
>>      [<000000009524c6a1>] blk_iolatency_init+0x25/0x160
>>      [<0000000074c7283e>] blkcg_init_queue+0xb4/0x120
>>      [<00000000747d5b28>] __alloc_disk_node+0xeb/0x1d0
>>      [<0000000082cf1eb2>] __blk_alloc_disk+0x31/0x60
>>      [<000000008e36f1d8>] nvme_mpath_alloc_disk+0xcc/0x1c0 [nvme_core]
>>      [<00000000f81c9db1>] nvme_alloc_ns_head+0x12b/0x250 [nvme_core]
>>      [<00000000eba12e37>] nvme_init_ns_head+0x255/0x2d0 [nvme_core]
>>      [<000000006f769fbb>] nvme_alloc_ns+0x114/0x4b0 [nvme_core]
>>      [<00000000cf38f67b>] nvme_validate_or_alloc_ns+0x9e/0x1c0 
>> [nvme_core]
>>      [<00000000db73ed81>] nvme_scan_ns_list+0xf7/0x2c0 [nvme_core]
>>      [<0000000011b21727>] nvme_scan_work+0xde/0x270 [nvme_core]
>>      [<000000000f7941ae>] process_one_work+0x1e5/0x3b0
>>      [<000000008eb36ec1>] worker_thread+0x50/0x3a0
>>      [<00000000e58a93ca>] kthread+0xe8/0x110
>>      [<00000000e82e51e5>] ret_from_fork+0x22/0x30
>> -- 
> 
> Hmm. Looks like a generic nvme issue, so unlikely it's introduced by 
> this patch. We still should fix it, though :-)

I'll have a look.