kernel panic due to a nvmet race
Engel, Amit
Amit.Engel at Dell.com
Tue May 17 03:48:14 PDT 2022
Hi All,
We observed a kernel panic which based on our analysis is due to a nvmet race.
the race is between nvme connect and nvmet tcp port removal.
The scenario:
In case that nvmet_port_release is freeing the nvmet port just before nvme connect is trying to 'nvmet_find_get_subsys' (as part of nvmet_alloc_ctrl) nvmet_find_get_subsys is trying to access a port which is already freed:
nvme/target/core.c:
static struct nvmet_subsys *nvmet_find_get_subsys(struct nvmet_port *port,
>------->-------const char *subsysnqn)
...snip
>-------down_read(&nvmet_config_sem);
>-------list_for_each_entry(p, &port->subsystems, entry) {
>------->-------if (!strncmp(p->subsys->subsysnqn, subsysnqn,
crash> bt
PID: 30216 TASK: ffff888c1e163f00 CPU: 0 COMMAND: "nt"
#0 [ffffc90020153858] machine_kexec at ffffffff81062fcc
#1 [ffffc900201538b0] __crash_kexec at ffffffff811273ef
#2 [ffffc90020153978] panic at ffffffff810851f7
#3 [ffffc90020153a18] no_context at ffffffff8107104f
#4 [ffffc90020153a80] page_fault at ffffffff81801184
[exception RIP: nvmet_find_get_subsys+161]
RIP: ffffffffa0bbce01 RSP: ffffc90020153b38 RFLAGS: 00010282
RAX: ffff888c1e163f01 RBX: 0000000000000000 RCX: 0000000000000020
RDX: 0000000000000000 RSI: ffffffffa0bc5895 RDI: ffffffffa0bce040
RBP: ffff88aeafc3f520 R8: ffffc90020153ba0 R9: 0000000000000000
R10: ffffc90020153bf8 R11: ffff888cb8e97b00 R12: ffff888bb3469a00
R13: ffff888bb3469900 R14: ffffc9000c41ba70 R15: ffffc90020153ba0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#5 [ffffc90020153b58] nvmet_alloc_ctrl at ffffffffa0bbe4c2 [nvmet]
Can you please review and provide your inputs ?
Thanks,
Amit
More information about the Linux-nvme
mailing list