Occasional kernel error with NVMe-oF TCP target

Wed Aug 21 03:57:15 PDT 2024

> Is the stack trace enough to go on? Is this already fixed in a newer
> kernel? Do you need me to gather more information?

The stack trace was more than sufficient, it's still reproducible on 6.11.0-rc4
What happens is that the nvme-tcp failed to allocate sufficient memory
for the commands and the
queue release procedure dereferences a NULL pointer

Easy to reproduce with this patch:

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 5bff0d5464d1..9cdb6e81169f 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1516,7 +1516,7 @@ static int nvmet_tcp_alloc_cmds(struct
nvmet_tcp_queue *queue)
        struct nvmet_tcp_cmd *cmds;
        int i, ret = -EINVAL, nr_cmds = queue->nr_cmds;

-       cmds = kcalloc(nr_cmds, sizeof(struct nvmet_tcp_cmd), GFP_KERNEL);
+       cmds = NULL;
        if (!cmds)
                goto out;


[   66.397317] nvmet: failed to install queue 0 cntlid 1 ret 6
[   66.398385] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000008
[...]
[   66.413549] Call trace:
[   66.413739]  nvmet_tcp_release_queue_work+0xe8/0x2f0 [nvmet_tcp]
[   66.414199]  process_one_work+0x188/0x410

Maurizio