kernel panic due to a nvmet race

Engel, Amit Amit.Engel at Dell.com
Tue May 17 03:48:14 PDT 2022


Hi All,

We observed a kernel panic which based on our analysis is due to a nvmet race.
the race is between nvme connect and nvmet tcp port removal.
The scenario:
In case that nvmet_port_release is freeing the nvmet port just before nvme connect is trying to 'nvmet_find_get_subsys' (as part of nvmet_alloc_ctrl) nvmet_find_get_subsys is trying to access a port which is already freed:

nvme/target/core.c:
static struct nvmet_subsys *nvmet_find_get_subsys(struct nvmet_port *port,
>------->-------const char *subsysnqn)
...snip
>-------down_read(&nvmet_config_sem);
>-------list_for_each_entry(p, &port->subsystems, entry) {
>------->-------if (!strncmp(p->subsys->subsysnqn, subsysnqn,

crash> bt
PID: 30216  TASK: ffff888c1e163f00  CPU: 0   COMMAND: "nt"
 #0 [ffffc90020153858] machine_kexec at ffffffff81062fcc
 #1 [ffffc900201538b0] __crash_kexec at ffffffff811273ef
 #2 [ffffc90020153978] panic at ffffffff810851f7
 #3 [ffffc90020153a18] no_context at ffffffff8107104f
 #4 [ffffc90020153a80] page_fault at ffffffff81801184
    [exception RIP: nvmet_find_get_subsys+161]
    RIP: ffffffffa0bbce01  RSP: ffffc90020153b38  RFLAGS: 00010282
    RAX: ffff888c1e163f01  RBX: 0000000000000000  RCX: 0000000000000020
    RDX: 0000000000000000  RSI: ffffffffa0bc5895  RDI: ffffffffa0bce040
    RBP: ffff88aeafc3f520   R8: ffffc90020153ba0   R9: 0000000000000000
    R10: ffffc90020153bf8  R11: ffff888cb8e97b00  R12: ffff888bb3469a00
    R13: ffff888bb3469900  R14: ffffc9000c41ba70  R15: ffffc90020153ba0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffc90020153b58] nvmet_alloc_ctrl at ffffffffa0bbe4c2 [nvmet]

Can you please review and provide your inputs ?

Thanks,
Amit




More information about the Linux-nvme mailing list