Occasional kernel error with NVMe-oF TCP target
Jonas Konrad
me at yawk.at
Wed Aug 21 06:31:03 PDT 2024
Thanks!
So from what I understand, this patch fixes the "NULL pointer
dereference" part. However, the allocation failure and associated error
still remains, yes? And I assume the nvme-of connection would still fail?
Is it possible that the allocation failure is caused by one of the leaks
that have been fixed in the past year in nvme-of (I saw some in the git
blame), and they haven't reached LTS yet? I can try getting a kdump next
time this issue happens to see why there is an allocation failure in the
first place.
- Jonas
On 8/21/24 2:16 PM, Maurizio Lombardi wrote:
> st 21. 8. 2024 v 12:57 odesílatel Maurizio Lombardi
> <mlombard at redhat.com> napsal:
>>
>>> Is the stack trace enough to go on? Is this already fixed in a newer
>>> kernel? Do you need me to gather more information?
>>
>> The stack trace was more than sufficient, it's still reproducible on 6.11.0-rc4
>> What happens is that the nvme-tcp failed to allocate sufficient memory
>> for the commands and the
>> queue release procedure dereferences a NULL pointer
>>
>
> This should be sufficient to fix the crash
>
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 5bff0d5464d1..b37506ad851f 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1534,6 +1534,7 @@ static int nvmet_tcp_alloc_cmds(struct
> nvmet_tcp_queue *queue)
> nvmet_tcp_free_cmd(cmds + i);
> kfree(cmds);
> out:
> + queue->nr_cmds = 0;
> return ret;
> }
>
> Maurizio
>
More information about the Linux-nvme
mailing list