[PATCH 1/1] fix left bit-shift overflow in __exclude_unnecessary_pages()
Alexander Egorenkov
egorenar at linux.ibm.com
Wed Sep 1 23:11:58 PDT 2021
Hi Kazu,
HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab at nec.com> writes:
> -----Original Message-----
>> > -----Original Message-----
>> >> Whenever the variables compound_order or private become greater than
>> >> 31, left bit-shift of 1 overflows, and nr_pages becomes zero. If nr_pages
>> >> becomes 0 and pages are being excluded at the end of the PFN loop, the
>> >> else branch of the last if statement is entered and pfn is decremented by
>> >> 1 because nr_pages is 0. Finally, this causes the loop variable pfn to
>> >> be assigned the same value as before when the next loop iteration begins
>> >> which results in an infinite loop.
>> >>
>> >> This issue appeared on s390 64bit architecture with a dump of 16GB RAM.
>> >
>> > The patch looks good to me, but just out of curiosity, when do the
>> > compound_order or private become greater than 31 on s390?
>> >
>> > Thanks,
>> > Kazu
>> >
>>
>> I added some debug statements and this what i got:
>>
>> compound_order 0
>> compound_order 1
>> compound_order 2
>> compound_order 3
>> compound_order 4
>> compound_order 5
>> compound_order 6
>> compound_order 7
>> compound_order 8
>> private 0
>> private 1
>> private 2
>> private 3
>> private 4
>> private 5
>> private 52
>> private 6
>> private 7
>> private 8
>>
>> It seems that not compound_order but private is at fault here and
>> triggers the bug. Not sure yet what that exactly means and whether we
>> have here another bug which triggers this one :/
>
> Hmm, so makedumpfile will exclude many pages wrongly with the patch?
> Excluding pages wrongly is better than failing with an infinite loop,
> but not better than including pages wrongly, because it might lose
> necessary data for investigation.
>
> So I think we should have a sanity check also for the private. AFAIK,
> the private value (buddy allocator's order) should be less than MAX_ORDER.
> If this is correct, we can use LENGTH(zone.free_area) in vmcoreinfo.
>
Completely agree with this. We need to fix the real cause of this and
not the effect. First need to understand better what the loop is exactly doing.
Regards
Alex
More information about the kexec
mailing list