[PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)

Sourabh Jain sourabhjain at linux.ibm.com
Tue Jul 1 22:03:59 PDT 2025


Hello Kazu,

On 02/07/25 10:22, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Tao,
>
> On 2025/07/02 13:36, Tao Liu wrote:
>> Hi Kazu,
>>
>> On Wed, Jul 2, 2025 at 12:13 PM HAGIO KAZUHITO(萩尾 一仁)
>> <k-hagio-ab at nec.com> wrote:
>>> On 2025/07/01 16:59, Tao Liu wrote:
>>>> Hi Kazu,
>>>>
>>>> Thanks for your comments!
>>>>
>>>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab at nec.com> wrote:
>>>>> Hi Tao,
>>>>>
>>>>> thank you for the patch.
>>>>>
>>>>> On 2025/06/25 11:23, Tao Liu wrote:
>>>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
>>>>>> reproduced with upstream makedumpfile.
>>>>>>
>>>>>> When analyzing the corrupt vmcore using crash, the following error
>>>>>> message will output:
>>>>>>
>>>>>>         crash: compressed kdump: uncompress failed: 0
>>>>>>         crash: read error: kernel virtual address: c0001e2d2fe48000  type:
>>>>>>         "hardirq thread_union"
>>>>>>         crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
>>>>>>         crash: compressed kdump: uncompress failed: 0
>>>>>>
>>>>>> If the vmcore is generated without num-threads option, then no such
>>>>>> errors are noticed.
>>>>>>
>>>>>> With --num-threads=N enabled, there will be N sub-threads created. All
>>>>>> sub-threads are producers which responsible for mm page processing, e.g.
>>>>>> compression. The main thread is the consumer which responsible for
>>>>>> writing the compressed data into file. page_flag_buf->ready is used to
>>>>>> sync main and sub-threads. When a sub-thread finishes page processing,
>>>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
>>>>>> looply check all threads of the ready flags, and break the loop when
>>>>>> find FLAG_READY.
>>>>> I've tried to reproduce the issue, but I couldn't on x86_64.
>>>> Yes, I cannot reproduce it on x86_64 either, but the issue is very
>>>> easily reproduced on ppc64 arch, which is where our QE reported.
>>>> Recently we have enabled --num-threads=N in rhel by default. N ==
>>>> nr_cpus in 2nd kernel, so QE noticed the issue.
>>> I see, thank you for the information.
>>>
>>>>> Do you have any possible scenario that breaks a vmcore?  I could not
>>>>> think of it only by looking at the code.
>>>> I guess the issue only been observed on ppc might be due to ppc's
>>>> memory model, multi-thread scheduling algorithm etc. I'm not an expert
>>>> on those. So I cannot give a clear explanation, sorry...
>>> ok, I also don't think of how to debug this well..
>>>
>>>> The page_flag_buf->ready is an integer that r/w by main and sub
>>>> threads simultaneously. And the assignment operation, like
>>>> page_flag_buf->ready = 1, might be composed of several assembly
>>>> instructions. Without atomic r/w (memory) protection, there might be
>>>> racing r/w just within the few instructions, which caused the data
>>>> inconsistency. Frankly the ppc assembly consists of more instructions
>>>> than x86_64 for the same c code, which enlarged the possibility of
>>>> data racing.
>>>>
>>>> We can observe the issue without the help of crash, just compare the
>>>> binary output of vmcore generated from the same core file, and
>>>> compress it with or without --num-threads option. Then compare it with
>>>> "cmp vmcore1 vmcore2" cmdline, and cmp will output bytes differ for
>>>> the 2 vmcores, and this is unexpected.
>>>>
>>>>> and this is just out of curiosity, is the issue reproduced with
>>>>> makedumpfile compiled with -O0 too?
>>>> Sorry, I haven't done the -O0 experiment, I can do it tomorrow and
>>>> share my findings...
>>> Thanks, we have to fix this anyway, I want a clue to think about a
>>> possible scenario..
>> 1) Compiled with -O2 flag:
>>
>> [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out1
>> Copying data                                      : [100.0 %] /
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out1.
>>
>> makedumpfile Completed.
>> [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
>> 31 -l ~/vmcore /tmp/out2
>> Copying data                                      : [100.0 %] |
>>      eta: 0s
>> Copying data                                      : [100.0 %] \
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out2.
>>
>> makedumpfile Completed.
>> [root at ibm-p10-01-lp45 makedumpfile]# cd /tmp
>> [root at ibm-p10-01-lp45 tmp]# cmp out1 out2
>> out1 out2 differ: byte 20786414, line 108064
>>
>> 2) Compiled with -O0 flag:
>>
>> [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out3
>> Copying data                                      : [100.0 %] /
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out3.
>>
>> makedumpfile Completed.
>> [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
>> 31 -l ~/vmcore /tmp/out4
>> Copying data                                      : [100.0 %] |
>>      eta: 0s
>> Copying data                                      : [100.0 %] \
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out4.
>>
>> makedumpfile Completed.
>> [root at ibm-p10-01-lp45 makedumpfile]# cd /tmp
>> [root at ibm-p10-01-lp45 tmp]# cmp out3 out4
>> out3 out4 differ: byte 23948282, line 151739
>>
>> Looks to me the O0/O2 have no difference for this case. If no problem,
>> the /tmp/outX generated from both single/multi thread should be
>> exactly the same, however the cmp reports there are differences. With
>> the v2 patch applied, there is no such difference:
>>
>> [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out5
>> Copying data                                      : [100.0 %] /
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out5.
>>
>> makedumpfile Completed.
>> [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
>> 31 -l ~/vmcore /tmp/out6
>> Copying data                                      : [100.0 %] |
>>      eta: 0s
>> Copying data                                      : [100.0 %] \
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out6.
>>
>> makedumpfile Completed.
>> [root at ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out5 /tmp/out6
>> [root at ibm-p10-01-lp45 makedumpfile]#
> thank you for testing!  sorry one more thing,
> does --num-threads=1 break the vmcore?

I was able to reproduce this issue with --num-threads=1. The reason is 
that when --num-threads is specified,
makedumpfile uses one producer and one consumer thread. So even with 
--num-threads=1, multithreading
is still in effect.

Thanks,
Sourabh Jain



More information about the kexec mailing list