[PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)

Tue Jul 1 00:59:53 PDT 2025

Hi Kazu,

Thanks for your comments!

On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾　一仁) <k-hagio-ab at nec.com> wrote:
>
> Hi Tao,
>
> thank you for the patch.
>
> On 2025/06/25 11:23, Tao Liu wrote:
> > A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> > reproduced with upstream makedumpfile.
> >
> > When analyzing the corrupt vmcore using crash, the following error
> > message will output:
> >
> >      crash: compressed kdump: uncompress failed: 0
> >      crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> >      "hardirq thread_union"
> >      crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> >      crash: compressed kdump: uncompress failed: 0
> >
> > If the vmcore is generated without num-threads option, then no such
> > errors are noticed.
> >
> > With --num-threads=N enabled, there will be N sub-threads created. All
> > sub-threads are producers which responsible for mm page processing, e.g.
> > compression. The main thread is the consumer which responsible for
> > writing the compressed data into file. page_flag_buf->ready is used to
> > sync main and sub-threads. When a sub-thread finishes page processing,
> > it will set ready flag to be FLAG_READY. In the meantime, main thread
> > looply check all threads of the ready flags, and break the loop when
> > find FLAG_READY.
>
> I've tried to reproduce the issue, but I couldn't on x86_64.

Yes, I cannot reproduce it on x86_64 either, but the issue is very
easily reproduced on ppc64 arch, which is where our QE reported.
Recently we have enabled --num-threads=N in rhel by default. N ==
nr_cpus in 2nd kernel, so QE noticed the issue.

>
> Do you have any possible scenario that breaks a vmcore?  I could not
> think of it only by looking at the code.

I guess the issue only been observed on ppc might be due to ppc's
memory model, multi-thread scheduling algorithm etc. I'm not an expert
on those. So I cannot give a clear explanation, sorry...

The page_flag_buf->ready is an integer that r/w by main and sub
threads simultaneously. And the assignment operation, like
page_flag_buf->ready = 1, might be composed of several assembly
instructions. Without atomic r/w (memory) protection, there might be
racing r/w just within the few instructions, which caused the data
inconsistency. Frankly the ppc assembly consists of more instructions
than x86_64 for the same c code, which enlarged the possibility of
data racing.

We can observe the issue without the help of crash, just compare the
binary output of vmcore generated from the same core file, and
compress it with or without --num-threads option. Then compare it with
"cmp vmcore1 vmcore2" cmdline, and cmp will output bytes differ for
the 2 vmcores, and this is unexpected.

>
> and this is just out of curiosity, is the issue reproduced with
> makedumpfile compiled with -O0 too?

Sorry, I haven't done the -O0 experiment, I can do it tomorrow and
share my findings...

Thanks,
Tao Liu

>
> Thanks,
> Kazu
>
> >
> > page_flag_buf->ready is read/write by main/sub-threads simultaneously,
> > but it is unprotected and unsafe. I have tested both mutex and atomic_rw
> > can fix this issue. This patch takes atomic_rw for its simplicity.
> >
> > [1]: https://github.com/makedumpfile/makedumpfile/issues/15
> >
> > Tested-by: Sourabh Jain <sourabhjain at linux.ibm.com>
> > Signed-off-by: Tao Liu <ltao at redhat.com>
> > ---
> >
> > v2 -> v1: Add error message of crash into commit log
> >
> > ---
> >   makedumpfile.c | 21 ++++++++++++++-------
> >   1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/makedumpfile.c b/makedumpfile.c
> > index 2d3b08b..bac45c2 100644
> > --- a/makedumpfile.c
> > +++ b/makedumpfile.c
> > @@ -8621,7 +8621,8 @@ kdump_thread_function_cyclic(void *arg) {
> >
> >               while (buf_ready == FALSE) {
> >                       pthread_testcancel();
> > -                     if (page_flag_buf->ready == FLAG_READY)
> > +                     if (__atomic_load_n(&page_flag_buf->ready,
> > +                                     __ATOMIC_SEQ_CST) == FLAG_READY)
> >                               continue;
> >
> >                       /* get next dumpable pfn */
> > @@ -8637,7 +8638,8 @@ kdump_thread_function_cyclic(void *arg) {
> >                       info->current_pfn = pfn + 1;
> >
> >                       page_flag_buf->pfn = pfn;
> > -                     page_flag_buf->ready = FLAG_FILLING;
> > +                     __atomic_store_n(&page_flag_buf->ready, FLAG_FILLING,
> > +                                     __ATOMIC_SEQ_CST);
> >                       pthread_mutex_unlock(&info->current_pfn_mutex);
> >                       sem_post(&info->page_flag_buf_sem);
> >
> > @@ -8726,7 +8728,8 @@ kdump_thread_function_cyclic(void *arg) {
> >                       page_flag_buf->index = index;
> >                       buf_ready = TRUE;
> >   next:
> > -                     page_flag_buf->ready = FLAG_READY;
> > +                     __atomic_store_n(&page_flag_buf->ready, FLAG_READY,
> > +                                     __ATOMIC_SEQ_CST);
> >                       page_flag_buf = page_flag_buf->next;
> >
> >               }
> > @@ -8855,7 +8858,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
> >                        * current_pfn is used for recording the value of pfn when checking the pfn.
> >                        */
> >                       for (i = 0; i < info->num_threads; i++) {
> > -                             if (info->page_flag_buf[i]->ready == FLAG_UNUSED)
> > +                             if (__atomic_load_n(&info->page_flag_buf[i]->ready,
> > +                                             __ATOMIC_SEQ_CST) == FLAG_UNUSED)
> >                                       continue;
> >                               temp_pfn = info->page_flag_buf[i]->pfn;
> >
> > @@ -8863,7 +8867,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
> >                                * count how many threads have reached the end.
> >                                */
> >                               if (temp_pfn >= end_pfn) {
> > -                                     info->page_flag_buf[i]->ready = FLAG_UNUSED;
> > +                                     __atomic_store_n(&info->page_flag_buf[i]->ready,
> > +                                             FLAG_UNUSED, __ATOMIC_SEQ_CST);
> >                                       end_count++;
> >                                       continue;
> >                               }
> > @@ -8885,7 +8890,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
> >                        * If the page_flag_buf is not ready, the pfn recorded may be changed.
> >                        * So we should recheck.
> >                        */
> > -                     if (info->page_flag_buf[consuming]->ready != FLAG_READY) {
> > +                     if (__atomic_load_n(&info->page_flag_buf[consuming]->ready,
> > +                                     __ATOMIC_SEQ_CST) != FLAG_READY) {
> >                               clock_gettime(CLOCK_MONOTONIC, &new);
> >                               if (new.tv_sec - last.tv_sec > WAIT_TIME) {
> >                                       ERRMSG("Can't get data of pfn.\n");
> > @@ -8927,7 +8933,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
> >                               goto out;
> >                       page_data_buf[index].used = FALSE;
> >               }
> > -             info->page_flag_buf[consuming]->ready = FLAG_UNUSED;
> > +             __atomic_store_n(&info->page_flag_buf[consuming]->ready,
> > +                             FLAG_UNUSED, __ATOMIC_SEQ_CST);
> >               info->page_flag_buf[consuming] = info->page_flag_buf[consuming]->next;
> >       }
> >   finish: