Occasional kernel error with NVMe-oF TCP target

Jonas Konrad me at yawk.at
Wed Aug 21 06:31:03 PDT 2024


Thanks!

So from what I understand, this patch fixes the "NULL pointer 
dereference" part. However, the allocation failure and associated error 
still remains, yes? And I assume the nvme-of connection would still fail?

Is it possible that the allocation failure is caused by one of the leaks 
that have been fixed in the past year in nvme-of (I saw some in the git 
blame), and they haven't reached LTS yet? I can try getting a kdump next 
time this issue happens to see why there is an allocation failure in the 
first place.

- Jonas

On 8/21/24 2:16 PM, Maurizio Lombardi wrote:
> st 21. 8. 2024 v 12:57 odesílatel Maurizio Lombardi
> <mlombard at redhat.com> napsal:
>>
>>> Is the stack trace enough to go on? Is this already fixed in a newer
>>> kernel? Do you need me to gather more information?
>>
>> The stack trace was more than sufficient, it's still reproducible on 6.11.0-rc4
>> What happens is that the nvme-tcp failed to allocate sufficient memory
>> for the commands and the
>> queue release procedure dereferences a NULL pointer
>>
> 
> This should be sufficient to fix the crash
> 
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 5bff0d5464d1..b37506ad851f 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1534,6 +1534,7 @@ static int nvmet_tcp_alloc_cmds(struct
> nvmet_tcp_queue *queue)
>                  nvmet_tcp_free_cmd(cmds + i);
>          kfree(cmds);
>   out:
> +       queue->nr_cmds = 0;
>          return ret;
>   }
> 
> Maurizio
> 



More information about the Linux-nvme mailing list