cgroup null pointer dereference

Kamaljit Singh Kamaljit.Singh1 at wdc.com
Thu Apr 24 19:22:31 PDT 2025


Waiman,

>>>>> In my test env I've added a null check to 'child' and re-running
>>>>> the long-term test.
>>>>> I'm wondering if this patch is sufficient to address any underlying
>>>>> issue or is just a band-aid.
>>>>> Please share any known patches or suggestions.
>>>>>                -          while (child != parent) {
>>>>>                +         while (child && child != parent) {
>>>> Child can become NULL only if the updated_next list isn't parent
>>>> terminated. This should not happen. A warning is needed if it really
>>>> happens. I will take a further look to see if there is a bug somewhere.
>>> My test re-ran for 36+ hours without any CPU lockups or NMI. This
>>> patch seems to have helped.
>>>
>> I now see what is wrong. The cgroup_rstat_push_children() function is
>> supposed to be called with cgroup_rstat_lock held, but commit
>> 093c8812de2d3 ("cgroup: rstat: Cleanup flushing functions and
>> locking") changes that. Hence racing can corrupt the list. I will work
>> on a patch to fix that regression.
>
>It should also be in v6.15-rc1 branch but is missing in the nvme branch
>that you are using. So you need to use a more updated nvme, when
>available, to avoid this problem.
>
Thank you for finding that commit. I'll look for it.

Christoph, Sagi, Keith, Others,
Can this commit be merged into the nvme-6.15 branch please?

Thanks & Regards,
Kamaljit



More information about the Linux-nvme mailing list