[PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)

Thu Jul 3 07:31:00 PDT 2025

On Tue, 1 Jul 2025 19:59:53 +1200
Tao Liu <ltao at redhat.com> wrote:

> Hi Kazu,
> 
> Thanks for your comments!
> 
> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾　一仁) <k-hagio-ab at nec.com> wrote:
> >
> > Hi Tao,
> >
> > thank you for the patch.
> >
> > On 2025/06/25 11:23, Tao Liu wrote:  
> > > A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> > > reproduced with upstream makedumpfile.
> > >
> > > When analyzing the corrupt vmcore using crash, the following error
> > > message will output:
> > >
> > >      crash: compressed kdump: uncompress failed: 0
> > >      crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> > >      "hardirq thread_union"
> > >      crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> > >      crash: compressed kdump: uncompress failed: 0
> > >
> > > If the vmcore is generated without num-threads option, then no such
> > > errors are noticed.
> > >
> > > With --num-threads=N enabled, there will be N sub-threads created. All
> > > sub-threads are producers which responsible for mm page processing, e.g.
> > > compression. The main thread is the consumer which responsible for
> > > writing the compressed data into file. page_flag_buf->ready is used to
> > > sync main and sub-threads. When a sub-thread finishes page processing,
> > > it will set ready flag to be FLAG_READY. In the meantime, main thread
> > > looply check all threads of the ready flags, and break the loop when
> > > find FLAG_READY.  
> >
> > I've tried to reproduce the issue, but I couldn't on x86_64.  
> 
> Yes, I cannot reproduce it on x86_64 either, but the issue is very
> easily reproduced on ppc64 arch, which is where our QE reported.

Yes, this is expected. X86 implements a strongly ordered memory model,
so a "store-to-memory" instruction ensures that the new value is
immediately observed by other CPUs.

FWIW the current code is wrong even on X86, because it does nothing to
prevent compiler optimizations. The compiler is then allowed to reorder
instructions so that the write to page_flag_buf->ready happens after
other writes; with a bit of bad scheduling luck, the consumer thread
may see an inconsistent state (e.g. read a stale page_flag_buf->pfn).
Note that thanks to how compilers are designed (today), this issue is
more or less hypothetical. Nevertheless, the use of atomics fixes it,
because they also serve as memory barriers.

Petr T