Occasional kernel error with NVMe-oF TCP target
Maurizio Lombardi
mlombard at redhat.com
Wed Aug 21 03:57:15 PDT 2024
> Is the stack trace enough to go on? Is this already fixed in a newer
> kernel? Do you need me to gather more information?
The stack trace was more than sufficient, it's still reproducible on 6.11.0-rc4
What happens is that the nvme-tcp failed to allocate sufficient memory
for the commands and the
queue release procedure dereferences a NULL pointer
Easy to reproduce with this patch:
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 5bff0d5464d1..9cdb6e81169f 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1516,7 +1516,7 @@ static int nvmet_tcp_alloc_cmds(struct
nvmet_tcp_queue *queue)
struct nvmet_tcp_cmd *cmds;
int i, ret = -EINVAL, nr_cmds = queue->nr_cmds;
- cmds = kcalloc(nr_cmds, sizeof(struct nvmet_tcp_cmd), GFP_KERNEL);
+ cmds = NULL;
if (!cmds)
goto out;
[ 66.397317] nvmet: failed to install queue 0 cntlid 1 ret 6
[ 66.398385] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000008
[...]
[ 66.413549] Call trace:
[ 66.413739] nvmet_tcp_release_queue_work+0xe8/0x2f0 [nvmet_tcp]
[ 66.414199] process_one_work+0x188/0x410
Maurizio
More information about the Linux-nvme
mailing list