[PATCH blktests v1 0/2] extend nvme/045 to reconnect with invalid key

Daniel Wagner dwagner at suse.de
Wed Mar 6 01:36:29 PST 2024


On Wed, Mar 06, 2024 at 08:44:48AM +0000, Shinichiro Kawasaki wrote:
 > > sudo ./check nvme/045
> > > nvme/045 (Test re-authentication)                            [failed]
> > >     runtime  8.069s  ...  7.639s
> > >     --- tests/nvme/045.out      2024-03-05 18:09:07.267668493 +0900
> > >     +++ /home/shin/Blktests/blktests/results/nodev/nvme/045.out.bad     2024-03-05 18:10:07.735494384 +0900
> > >     @@ -9,5 +9,6 @@
> > >      Change hash to hmac(sha512)
> > >      Re-authenticate with changed hash
> > >      Renew host key on the controller and force reconnect
> > >     -disconnected 0 controller(s)
> > >     +controller "nvme1" not deleted within 5 seconds
> > >     +disconnected 1 controller(s)
> > >      Test complete
> > 
> > That means the host either successfully reconnected or never
> > disconnected. We have another test case just for the disconnect test
> > (number of queue changes), so if this test passes, it must be the
> > former... Shouldn't really happen, this would mean the auth code has bug.
> 
> The test case nvme/048 passes, so this looks a bug.

I'll try to recreate it.

> > If you have these patches applied, the test should pass. But we might
> > have still some more stuff to unify between the transports. The nvme/045
> > test passes in my setup. Though I have seen runs which were hang for
> > some reason. Haven't figured out yet what's happening there. But I
> > haven't seen failures.
> 
> Still with the fix of the double-free, I observe the nvme/045 failure for rdma,
> tcp and fc transports. I wonder where the difference between your system and
> mine comes from.
> 
> FYI, here I share the kernel messages for rdma transport. It shows that
> nvme_rdma_reconnect_or_remove() was called repeatedly and it tried to reconnect.
> The status argument is -111 or 880, so I think the recon flag is always true
> and no effect. I'm interested in the status values in your environment.

Do you have these patches applied:

https://lore.kernel.org/linux-nvme/20240305080005.3638-1-dwagner@suse.de/

?

> [   59.117607] run blktests nvme/045 at 2024-03-06 17:05:55
> [   59.198629] (null): rxe_set_mtu: Set mtu to 1024
> [   59.211185] PCLMULQDQ-NI instructions are not detected.
> [   59.362952] infiniband ens3_rxe: set active
> [   59.363765] infiniband ens3_rxe: added ens3
> [   59.540499] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [   59.560541] nvmet_rdma: enabling port 0 (10.0.2.15:4420)
> [   59.688866] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
> [   59.701114] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
> [   59.702195] nvme nvme1: qid 0: controller authenticated
> [   59.703310] nvme nvme1: qid 0: authenticated
> [   59.707478] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
> [   59.709883] nvme nvme1: creating 4 I/O queues.
> [   59.745087] nvme nvme1: mapped 4/0/0 default/read/poll queues.
> [   59.786869] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 10.0.2.15:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [   59.999761] nvme nvme1: re-authenticating controller
> [   60.010902] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
> [   60.011640] nvme nvme1: qid 0: controller authenticated
> [   60.025652] nvme nvme1: re-authenticating controller
> [   60.035349] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
> [   60.036375] nvme nvme1: qid 0: controller authenticated
> [   60.050449] nvme nvme1: re-authenticating controller
> [   60.060757] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
> [   60.061460] nvme nvme1: qid 0: controller authenticated
> [   62.662430] nvme nvme1: re-authenticating controller
> [   62.859510] nvme nvme1: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
> [   62.860502] nvme nvme1: qid 0: controller authenticated
> [   63.029182] nvme nvme1: re-authenticating controller
> [   63.192844] nvme nvme1: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
> [   63.193900] nvme nvme1: qid 0: controller authenticated
> [   63.608561] nvme nvme1: starting error recovery
> [   63.653699] nvme nvme1: Reconnecting in 1 seconds...
> [   64.712627] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
> [   64.868896] nvmet: ctrl 1 qid 0 host response mismatch
> [   64.870065] nvmet: ctrl 1 qid 0 failure1 (1)
> [   64.871152] nvmet: ctrl 1 fatal error occurred!
> [   64.871519] nvme nvme1: qid 0: authentication failed
> [   64.874330] nvme nvme1: failed to connect queue: 0 ret=-111
> [   64.878612] nvme nvme1: Failed reconnect attempt 1
> [   64.880472] nvme nvme1: Reconnecting in 1 seconds...

This looks like the DNR bit is not considered in the reconnect_or_delete
function.



More information about the Linux-nvme mailing list