Occasional kernel error with NVMe-oF TCP target

Maurizio Lombardi mlombard at redhat.com
Wed Aug 21 06:50:18 PDT 2024


st 21. 8. 2024 v 15:32 odesílatel Jonas Konrad <me at yawk.at> napsal:
>
> Thanks!
>
> So from what I understand, this patch fixes the "NULL pointer
> dereference" part. However, the allocation failure and associated error
> still remains, yes? And I assume the nvme-of connection would still fail?

Yes, the connection will still fail but at least the server won't crash.

>
> Is it possible that the allocation failure is caused by one of the leaks
> that have been fixed in the past year in nvme-of (I saw some in the git
> blame), and they haven't reached LTS yet? I can try getting a kdump next
> time this issue happens to see why there is an allocation failure in the
> first place.

Hmm I am not sure, looking at the memory status printed by the kernel
looks like the nvme-tcp driver was attempting to allocate an order 6
page allocation

[436736.043084] kworker/5:2H: page allocation failure: order:6,

2^6 * 4096 = 256Kb

----
Node 0 Normal: 277849*4kB (UME) 599680*8kB (UME)
861628*16kB (UME) 313346*32kB (UME) 27161*64kB (UME) 33*128kB (U)
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 31464484kB
---
it makes me suspicious the fact that there are just a few blocks of
contiguous 128 Kb memory
chunks and zero of larger size, this is why the 256Kb memory allocation fails.
So I suspect your system is suffering from severe memory
fragmentation, but I don't know
the root cause nor the solution.

Maurizio




More information about the Linux-nvme mailing list