[PATCH blktests v1 0/2] extend nvme/045 to reconnect with invalid key
Shinichiro Kawasaki
shinichiro.kawasaki at wdc.com
Wed Mar 6 03:28:47 PST 2024
On Mar 06, 2024 / 10:36, Daniel Wagner wrote:
> On Wed, Mar 06, 2024 at 08:44:48AM +0000, Shinichiro Kawasaki wrote:
> > > sudo ./check nvme/045
> > > > nvme/045 (Test re-authentication) [failed]
> > > > runtime 8.069s ... 7.639s
> > > > --- tests/nvme/045.out 2024-03-05 18:09:07.267668493 +0900
> > > > +++ /home/shin/Blktests/blktests/results/nodev/nvme/045.out.bad 2024-03-05 18:10:07.735494384 +0900
> > > > @@ -9,5 +9,6 @@
> > > > Change hash to hmac(sha512)
> > > > Re-authenticate with changed hash
> > > > Renew host key on the controller and force reconnect
> > > > -disconnected 0 controller(s)
> > > > +controller "nvme1" not deleted within 5 seconds
> > > > +disconnected 1 controller(s)
> > > > Test complete
> > >
> > > That means the host either successfully reconnected or never
> > > disconnected. We have another test case just for the disconnect test
> > > (number of queue changes), so if this test passes, it must be the
> > > former... Shouldn't really happen, this would mean the auth code has bug.
> >
> > The test case nvme/048 passes, so this looks a bug.
>
> I'll try to recreate it.
>
> > > If you have these patches applied, the test should pass. But we might
> > > have still some more stuff to unify between the transports. The nvme/045
> > > test passes in my setup. Though I have seen runs which were hang for
> > > some reason. Haven't figured out yet what's happening there. But I
> > > haven't seen failures.
> >
> > Still with the fix of the double-free, I observe the nvme/045 failure for rdma,
> > tcp and fc transports. I wonder where the difference between your system and
> > mine comes from.
> >
> > FYI, here I share the kernel messages for rdma transport. It shows that
> > nvme_rdma_reconnect_or_remove() was called repeatedly and it tried to reconnect.
> > The status argument is -111 or 880, so I think the recon flag is always true
> > and no effect. I'm interested in the status values in your environment.
>
> Do you have these patches applied:
>
> https://lore.kernel.org/linux-nvme/20240305080005.3638-1-dwagner@suse.de/
>
> ?
Yes. To be precise, I applied v2 patches, but I believe they have same logic as
v3. To be sure, I replaced the v2 patches with v3, then saw the same failure.
>
> > [ 59.117607] run blktests nvme/045 at 2024-03-06 17:05:55
> > [ 59.198629] (null): rxe_set_mtu: Set mtu to 1024
> > [ 59.211185] PCLMULQDQ-NI instructions are not detected.
> > [ 59.362952] infiniband ens3_rxe: set active
> > [ 59.363765] infiniband ens3_rxe: added ens3
> > [ 59.540499] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> > [ 59.560541] nvmet_rdma: enabling port 0 (10.0.2.15:4420)
> > [ 59.688866] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
> > [ 59.701114] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
> > [ 59.702195] nvme nvme1: qid 0: controller authenticated
> > [ 59.703310] nvme nvme1: qid 0: authenticated
> > [ 59.707478] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
> > [ 59.709883] nvme nvme1: creating 4 I/O queues.
> > [ 59.745087] nvme nvme1: mapped 4/0/0 default/read/poll queues.
> > [ 59.786869] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 10.0.2.15:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> > [ 59.999761] nvme nvme1: re-authenticating controller
> > [ 60.010902] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
> > [ 60.011640] nvme nvme1: qid 0: controller authenticated
> > [ 60.025652] nvme nvme1: re-authenticating controller
> > [ 60.035349] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
> > [ 60.036375] nvme nvme1: qid 0: controller authenticated
> > [ 60.050449] nvme nvme1: re-authenticating controller
> > [ 60.060757] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
> > [ 60.061460] nvme nvme1: qid 0: controller authenticated
> > [ 62.662430] nvme nvme1: re-authenticating controller
> > [ 62.859510] nvme nvme1: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
> > [ 62.860502] nvme nvme1: qid 0: controller authenticated
> > [ 63.029182] nvme nvme1: re-authenticating controller
> > [ 63.192844] nvme nvme1: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
> > [ 63.193900] nvme nvme1: qid 0: controller authenticated
> > [ 63.608561] nvme nvme1: starting error recovery
> > [ 63.653699] nvme nvme1: Reconnecting in 1 seconds...
> > [ 64.712627] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
> > [ 64.868896] nvmet: ctrl 1 qid 0 host response mismatch
> > [ 64.870065] nvmet: ctrl 1 qid 0 failure1 (1)
> > [ 64.871152] nvmet: ctrl 1 fatal error occurred!
> > [ 64.871519] nvme nvme1: qid 0: authentication failed
> > [ 64.874330] nvme nvme1: failed to connect queue: 0 ret=-111
> > [ 64.878612] nvme nvme1: Failed reconnect attempt 1
> > [ 64.880472] nvme nvme1: Reconnecting in 1 seconds...
>
> This looks like the DNR bit is not considered in the reconnect_or_delete
> function.
Agreed. Let me share the kernel messages with the added printk() below.
We can see recon is true at every nvme_rdma_reconnect_or_remove() call.
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 507cbd6a3312..bfb23b9d5d82 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -988,6 +988,7 @@ static void nvme_rdma_reconnect_or_remove(struct nvme_rdma_ctrl *ctrl,
enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);
bool recon = status > 0 && (status & NVME_SC_DNR) ? false : true;
+ printk("%s: status=%d recon=%d\n", __func__, status, recon);
/* If we are resetting/deleting then do nothing */
if (state != NVME_CTRL_CONNECTING) {
WARN_ON_ONCE(state == NVME_CTRL_NEW || state == NVME_CTRL_LIVE);
[ 139.201909] run blktests nvme/045 at 2024-03-06 20:22:44
[ 139.277895] (null): rxe_set_mtu: Set mtu to 1024
[ 139.291307] PCLMULQDQ-NI instructions are not detected.
[ 139.451087] infiniband ens3_rxe: set active
[ 139.451856] infiniband ens3_rxe: added ens3
[ 139.627815] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 139.647899] nvmet_rdma: enabling port 0 (10.0.2.15:4420)
[ 139.777796] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[ 139.789952] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 139.791053] nvme nvme1: qid 0: controller authenticated
[ 139.792039] nvme nvme1: qid 0: authenticated
[ 139.796131] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[ 139.798258] nvme nvme1: creating 4 I/O queues.
[ 139.832707] nvme nvme1: mapped 4/0/0 default/read/poll queues.
[ 139.874555] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 10.0.2.15:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[ 140.084928] nvme nvme1: re-authenticating controller
[ 140.095934] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 140.096705] nvme nvme1: qid 0: controller authenticated
[ 140.111696] nvme nvme1: re-authenticating controller
[ 140.121209] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 140.121885] nvme nvme1: qid 0: controller authenticated
[ 140.138435] nvme nvme1: re-authenticating controller
[ 140.148513] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[ 140.149344] nvme nvme1: qid 0: controller authenticated
[ 142.748523] nvme nvme1: re-authenticating controller
[ 142.944154] nvme nvme1: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 142.945023] nvme nvme1: qid 0: controller authenticated
[ 143.127665] nvme nvme1: re-authenticating controller
[ 143.293304] nvme nvme1: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe8192
[ 143.294204] nvme nvme1: qid 0: controller authenticated
[ 143.732511] nvme nvme1: starting error recovery
[ 143.785440] nvme_rdma_reconnect_or_remove: status=-107 recon=1
[ 143.786448] nvme nvme1: Reconnecting in 1 seconds...
[ 144.834299] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[ 144.990883] nvmet: ctrl 1 qid 0 host response mismatch
[ 144.991937] nvmet: ctrl 1 qid 0 failure1 (1)
[ 144.992655] nvmet: ctrl 1 fatal error occurred!
[ 144.992953] nvme nvme1: qid 0: authentication failed
[ 144.996695] nvme nvme1: failed to connect queue: 0 ret=-111
[ 145.000090] nvme nvme1: Failed reconnect attempt 1
[ 145.001717] nvme_rdma_reconnect_or_remove: status=-111 recon=1
[ 145.003218] nvme nvme1: Reconnecting in 1 seconds...
[ 146.116430] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[ 146.272948] nvmet: ctrl 1 qid 0 host response mismatch
[ 146.274869] nvmet: ctrl 1 qid 0 failure1 (1)
[ 146.276687] nvmet: ctrl 1 fatal error occurred!
[ 146.276891] nvme nvme1: qid 0: authentication failed
[ 146.280591] nvme nvme1: failed to connect queue: 0 ret=-111
[ 146.283783] nvme nvme1: Failed reconnect attempt 2
[ 146.285339] nvme_rdma_reconnect_or_remove: status=-111 recon=1
[ 146.287206] nvme nvme1: Reconnecting in 1 seconds...
[ 147.395546] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[ 147.563730] nvmet: ctrl 1 qid 0 host response mismatch
[ 147.565628] nvmet: ctrl 1 qid 0 failure1 (1)
[ 147.567256] nvmet: ctrl 1 fatal error occurred!
[ 147.567411] nvme nvme1: qid 0: authentication failed
[ 147.571022] nvme nvme1: failed to connect queue: 0 ret=-111
[ 147.574003] nvme nvme1: Failed reconnect attempt 3
[ 147.575463] nvme_rdma_reconnect_or_remove: status=-111 recon=1
[ 147.576976] nvme nvme1: Reconnecting in 1 seconds...
[ 148.676722] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[ 148.842117] nvmet: ctrl 1 qid 0 host response mismatch
[ 148.844543] nvmet: ctrl 1 qid 0 failure1 (1)
[ 148.846369] nvmet: ctrl 1 fatal error occurred!
[ 148.846478] nvme nvme1: qid 0: authentication failed
[ 148.850217] nvme nvme1: failed to connect queue: 0 ret=-111
[ 148.853850] nvme nvme1: Failed reconnect attempt 4
[ 148.855425] nvme_rdma_reconnect_or_remove: status=-111 recon=1
[ 148.857496] nvme nvme1: Reconnecting in 1 seconds...
[ 149.873720] nvme nvme1: Identify namespace failed (880)
[ 149.877168] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[ 149.927672] nvme nvme1: Property Set error: 880, offset 0x14
[ 150.754847] rdma_rxe: unloaded
More information about the Linux-nvme
mailing list