[PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
Tao Liu
ltao at redhat.com
Tue Jul 1 22:03:01 PDT 2025
Hi Kazu,
On Wed, Jul 2, 2025 at 4:52 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab at nec.com> wrote:
>
> Hi Tao,
>
> On 2025/07/02 13:36, Tao Liu wrote:
> > Hi Kazu,
> >
> > On Wed, Jul 2, 2025 at 12:13 PM HAGIO KAZUHITO(萩尾 一仁)
> > <k-hagio-ab at nec.com> wrote:
> >>
> >> On 2025/07/01 16:59, Tao Liu wrote:
> >>> Hi Kazu,
> >>>
> >>> Thanks for your comments!
> >>>
> >>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab at nec.com> wrote:
> >>>>
> >>>> Hi Tao,
> >>>>
> >>>> thank you for the patch.
> >>>>
> >>>> On 2025/06/25 11:23, Tao Liu wrote:
> >>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> >>>>> reproduced with upstream makedumpfile.
> >>>>>
> >>>>> When analyzing the corrupt vmcore using crash, the following error
> >>>>> message will output:
> >>>>>
> >>>>> crash: compressed kdump: uncompress failed: 0
> >>>>> crash: read error: kernel virtual address: c0001e2d2fe48000 type:
> >>>>> "hardirq thread_union"
> >>>>> crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> >>>>> crash: compressed kdump: uncompress failed: 0
> >>>>>
> >>>>> If the vmcore is generated without num-threads option, then no such
> >>>>> errors are noticed.
> >>>>>
> >>>>> With --num-threads=N enabled, there will be N sub-threads created. All
> >>>>> sub-threads are producers which responsible for mm page processing, e.g.
> >>>>> compression. The main thread is the consumer which responsible for
> >>>>> writing the compressed data into file. page_flag_buf->ready is used to
> >>>>> sync main and sub-threads. When a sub-thread finishes page processing,
> >>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
> >>>>> looply check all threads of the ready flags, and break the loop when
> >>>>> find FLAG_READY.
> >>>>
> >>>> I've tried to reproduce the issue, but I couldn't on x86_64.
> >>>
> >>> Yes, I cannot reproduce it on x86_64 either, but the issue is very
> >>> easily reproduced on ppc64 arch, which is where our QE reported.
> >>> Recently we have enabled --num-threads=N in rhel by default. N ==
> >>> nr_cpus in 2nd kernel, so QE noticed the issue.
> >>
> >> I see, thank you for the information.
> >>
> >>>
> >>>>
> >>>> Do you have any possible scenario that breaks a vmcore? I could not
> >>>> think of it only by looking at the code.
> >>>
> >>> I guess the issue only been observed on ppc might be due to ppc's
> >>> memory model, multi-thread scheduling algorithm etc. I'm not an expert
> >>> on those. So I cannot give a clear explanation, sorry...
> >>
> >> ok, I also don't think of how to debug this well..
> >>
> >>>
> >>> The page_flag_buf->ready is an integer that r/w by main and sub
> >>> threads simultaneously. And the assignment operation, like
> >>> page_flag_buf->ready = 1, might be composed of several assembly
> >>> instructions. Without atomic r/w (memory) protection, there might be
> >>> racing r/w just within the few instructions, which caused the data
> >>> inconsistency. Frankly the ppc assembly consists of more instructions
> >>> than x86_64 for the same c code, which enlarged the possibility of
> >>> data racing.
> >>>
> >>> We can observe the issue without the help of crash, just compare the
> >>> binary output of vmcore generated from the same core file, and
> >>> compress it with or without --num-threads option. Then compare it with
> >>> "cmp vmcore1 vmcore2" cmdline, and cmp will output bytes differ for
> >>> the 2 vmcores, and this is unexpected.
> >>>
> >>>>
> >>>> and this is just out of curiosity, is the issue reproduced with
> >>>> makedumpfile compiled with -O0 too?
> >>>
> >>> Sorry, I haven't done the -O0 experiment, I can do it tomorrow and
> >>> share my findings...
> >>
> >> Thanks, we have to fix this anyway, I want a clue to think about a
> >> possible scenario..
> >
> > 1) Compiled with -O2 flag:
> >
> > [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out1
> > Copying data : [100.0 %] /
> > eta: 0s
> >
> > The dumpfile is saved to /tmp/out1.
> >
> > makedumpfile Completed.
> > [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
> > 31 -l ~/vmcore /tmp/out2
> > Copying data : [100.0 %] |
> > eta: 0s
> > Copying data : [100.0 %] \
> > eta: 0s
> >
> > The dumpfile is saved to /tmp/out2.
> >
> > makedumpfile Completed.
> > [root at ibm-p10-01-lp45 makedumpfile]# cd /tmp
> > [root at ibm-p10-01-lp45 tmp]# cmp out1 out2
> > out1 out2 differ: byte 20786414, line 108064
> >
> > 2) Compiled with -O0 flag:
> >
> > [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out3
> > Copying data : [100.0 %] /
> > eta: 0s
> >
> > The dumpfile is saved to /tmp/out3.
> >
> > makedumpfile Completed.
> > [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
> > 31 -l ~/vmcore /tmp/out4
> > Copying data : [100.0 %] |
> > eta: 0s
> > Copying data : [100.0 %] \
> > eta: 0s
> >
> > The dumpfile is saved to /tmp/out4.
> >
> > makedumpfile Completed.
> > [root at ibm-p10-01-lp45 makedumpfile]# cd /tmp
> > [root at ibm-p10-01-lp45 tmp]# cmp out3 out4
> > out3 out4 differ: byte 23948282, line 151739
> >
> > Looks to me the O0/O2 have no difference for this case. If no problem,
> > the /tmp/outX generated from both single/multi thread should be
> > exactly the same, however the cmp reports there are differences. With
> > the v2 patch applied, there is no such difference:
> >
> > [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out5
> > Copying data : [100.0 %] /
> > eta: 0s
> >
> > The dumpfile is saved to /tmp/out5.
> >
> > makedumpfile Completed.
> > [root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
> > 31 -l ~/vmcore /tmp/out6
> > Copying data : [100.0 %] |
> > eta: 0s
> > Copying data : [100.0 %] \
> > eta: 0s
> >
> > The dumpfile is saved to /tmp/out6.
> >
> > makedumpfile Completed.
> > [root at ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out5 /tmp/out6
> > [root at ibm-p10-01-lp45 makedumpfile]#
>
> thank you for testing! sorry one more thing,
> does --num-threads=1 break the vmcore?
Yes:
[root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out7
Copying data : [100.0 %] /
eta: 0s
The dumpfile is saved to /tmp/out7.
makedumpfile Completed.
[root at ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=1 -d
31 -l ~/vmcore /tmp/out8
Copying data : [100.0 %] -
eta: 0s
Copying data : [100.0 %] /
eta: 0s
The dumpfile is saved to /tmp/out8.
makedumpfile Completed.
[root at ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out7 /tmp/out8
/tmp/out7 /tmp/out8 differ: byte 11119019, line 49418
>
> Thanks,
> Kazu
More information about the kexec
mailing list