[PATCH blktests v1 0/2] extend nvme/045 to reconnect with invalid key

Shinichiro Kawasaki shinichiro.kawasaki at wdc.com
Wed Mar 6 00:44:48 PST 2024


On Mar 05, 2024 / 12:18, Daniel Wagner wrote:
> On Tue, Mar 05, 2024 at 09:44:45AM +0000, Shinichiro Kawasaki wrote:
> > On Mar 04, 2024 / 17:13, Daniel Wagner wrote:
> > > The is the test case for
> > > 
> > > https://lore.kernel.org/linux-nvme/20240304161006.19328-1-dwagner@suse.de/
> > >
> > > 
> > > Daniel Wagner (2):
> > >   nvme/rc: add reconnect-delay argument only for fabrics transports
> > >   nvme/048: add reconnect after ctrl key change
> > 
> > I apply the kernel patches in the link above to v6.8-rc7, then ran nvme/045
> > with the blktests patches in the series. And I observed failure of the test
> > case with various transports [1]. Is this failure expected?
> 
> If you have these patches applied, the test should pass. But we might
> have still some more stuff to unify between the transports. The nvme/045
> test passes in my setup. Though I have seen runs which were hang for
> some reason. Haven't figured out yet what's happening there. But I
> haven't seen failures, IIRC.
> 
> I am not really surprised we seeing some fallouts though. We start to
> test the error code paths with this test extension.
> 
> > Also, I observed KASAN double-free [2]. Do you observe it in your environment?
> > I created a quick fix [3], and it looks resolving the double-free.
> 
> No, I haven't seen this.
> 
> > sudo ./check nvme/045
> > nvme/045 (Test re-authentication)                            [failed]
> >     runtime  8.069s  ...  7.639s
> >     --- tests/nvme/045.out      2024-03-05 18:09:07.267668493 +0900
> >     +++ /home/shin/Blktests/blktests/results/nodev/nvme/045.out.bad     2024-03-05 18:10:07.735494384 +0900
> >     @@ -9,5 +9,6 @@
> >      Change hash to hmac(sha512)
> >      Re-authenticate with changed hash
> >      Renew host key on the controller and force reconnect
> >     -disconnected 0 controller(s)
> >     +controller "nvme1" not deleted within 5 seconds
> >     +disconnected 1 controller(s)
> >      Test complete
> 
> That means the host either successfully reconnected or never
> disconnected. We have another test case just for the disconnect test
> (number of queue changes), so if this test passes, it must be the
> former... Shouldn't really happen, this would mean the auth code has bug.

The test case nvme/048 passes, so this looks a bug.

> 
> > diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
> > index f2832f70e7e0..4e161d3cd840 100644
> > --- a/drivers/nvme/host/sysfs.c
> > +++ b/drivers/nvme/host/sysfs.c
> > @@ -221,14 +221,10 @@ static int ns_update_nuse(struct nvme_ns *ns)
> >  
> >  	ret = nvme_identify_ns(ns->ctrl, ns->head->ns_id, &id);
> >  	if (ret)
> > -		goto out_free_id;
> > +		return ret;
> 
> Yes, this is correct.
> >  
> >  	ns->head->nuse = le64_to_cpu(id->nuse);
> > -
> > -out_free_id:
> > -	kfree(id);
> > -
> > -	return ret;
> > +	return 0;
> >  }
> >
> 
> I think you still need to free the 'id' on the normal exit path though

Thanks, I posted the patch with the fix.

> 
> If you have these patches applied, the test should pass. But we might
> have still some more stuff to unify between the transports. The nvme/045
> test passes in my setup. Though I have seen runs which were hang for
> some reason. Haven't figured out yet what's happening there. But I
> haven't seen failures.

Still with the fix of the double-free, I observe the nvme/045 failure for rdma,
tcp and fc transports. I wonder where the difference between your system and
mine comes from.

FYI, here I share the kernel messages for rdma transport. It shows that
nvme_rdma_reconnect_or_remove() was called repeatedly and it tried to reconnect.
The status argument is -111 or 880, so I think the recon flag is always true
and no effect. I'm interested in the status values in your environment.


[   59.117607] run blktests nvme/045 at 2024-03-06 17:05:55
[   59.198629] (null): rxe_set_mtu: Set mtu to 1024
[   59.211185] PCLMULQDQ-NI instructions are not detected.
[   59.362952] infiniband ens3_rxe: set active
[   59.363765] infiniband ens3_rxe: added ens3
[   59.540499] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   59.560541] nvmet_rdma: enabling port 0 (10.0.2.15:4420)
[   59.688866] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[   59.701114] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[   59.702195] nvme nvme1: qid 0: controller authenticated
[   59.703310] nvme nvme1: qid 0: authenticated
[   59.707478] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[   59.709883] nvme nvme1: creating 4 I/O queues.
[   59.745087] nvme nvme1: mapped 4/0/0 default/read/poll queues.
[   59.786869] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 10.0.2.15:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[   59.999761] nvme nvme1: re-authenticating controller
[   60.010902] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[   60.011640] nvme nvme1: qid 0: controller authenticated
[   60.025652] nvme nvme1: re-authenticating controller
[   60.035349] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[   60.036375] nvme nvme1: qid 0: controller authenticated
[   60.050449] nvme nvme1: re-authenticating controller
[   60.060757] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[   60.061460] nvme nvme1: qid 0: controller authenticated
[   62.662430] nvme nvme1: re-authenticating controller
[   62.859510] nvme nvme1: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[   62.860502] nvme nvme1: qid 0: controller authenticated
[   63.029182] nvme nvme1: re-authenticating controller
[   63.192844] nvme nvme1: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[   63.193900] nvme nvme1: qid 0: controller authenticated
[   63.608561] nvme nvme1: starting error recovery
[   63.653699] nvme nvme1: Reconnecting in 1 seconds...
[   64.712627] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[   64.868896] nvmet: ctrl 1 qid 0 host response mismatch
[   64.870065] nvmet: ctrl 1 qid 0 failure1 (1)
[   64.871152] nvmet: ctrl 1 fatal error occurred!
[   64.871519] nvme nvme1: qid 0: authentication failed
[   64.874330] nvme nvme1: failed to connect queue: 0 ret=-111
[   64.878612] nvme nvme1: Failed reconnect attempt 1
[   64.880472] nvme nvme1: Reconnecting in 1 seconds...
[   66.040957] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[   66.200862] nvmet: ctrl 1 qid 0 host response mismatch
[   66.203005] nvmet: ctrl 1 qid 0 failure1 (1)
[   66.204873] nvmet: ctrl 1 fatal error occurred!
[   66.205148] nvme nvme1: qid 0: authentication failed
[   66.208609] nvme nvme1: failed to connect queue: 0 ret=-111
[   66.212033] nvme nvme1: Failed reconnect attempt 2
[   66.213837] nvme nvme1: Reconnecting in 1 seconds...
[   67.327576] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[   67.485392] nvmet: ctrl 1 qid 0 host response mismatch
[   67.487440] nvmet: ctrl 1 qid 0 failure1 (1)
[   67.489403] nvmet: ctrl 1 fatal error occurred!
[   67.489565] nvme nvme1: qid 0: authentication failed
[   67.493015] nvme nvme1: failed to connect queue: 0 ret=-111
[   67.496909] nvme nvme1: Failed reconnect attempt 3
[   67.498692] nvme nvme1: Reconnecting in 1 seconds...
[   68.610640] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[   68.739298] nvme nvme1: Identify namespace failed (880)
[   68.742833] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[   68.774125] nvmet: ctrl 1 qid 0 host response mismatch
[   68.776440] nvme nvme1: qid 0 auth_send failed with status 880
[   68.778133] nvme nvme1: qid 0 failed to receive success1, nvme status 880
[   68.780300] nvme nvme1: qid 0: authentication failed
[   68.782490] nvme nvme1: failed to connect queue: 0 ret=880
[   68.785335] nvme nvme1: Failed reconnect attempt 4
[   68.829188] nvme nvme1: Property Set error: 880, offset 0x14
[   69.634482] rdma_rxe: unloaded




More information about the Linux-nvme mailing list