[PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)

Thu Jul 3 15:35:20 PDT 2025

Hi Petr,

On Fri, Jul 4, 2025 at 2:31 AM Petr Tesarik <ptesarik at suse.com> wrote:
>
> On Tue, 1 Jul 2025 19:59:53 +1200
> Tao Liu <ltao at redhat.com> wrote:
>
> > Hi Kazu,
> >
> > Thanks for your comments!
> >
> > On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾　一仁) <k-hagio-ab at nec.com> wrote:
> > >
> > > Hi Tao,
> > >
> > > thank you for the patch.
> > >
> > > On 2025/06/25 11:23, Tao Liu wrote:
> > > > A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> > > > reproduced with upstream makedumpfile.
> > > >
> > > > When analyzing the corrupt vmcore using crash, the following error
> > > > message will output:
> > > >
> > > >      crash: compressed kdump: uncompress failed: 0
> > > >      crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> > > >      "hardirq thread_union"
> > > >      crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> > > >      crash: compressed kdump: uncompress failed: 0
> > > >
> > > > If the vmcore is generated without num-threads option, then no such
> > > > errors are noticed.
> > > >
> > > > With --num-threads=N enabled, there will be N sub-threads created. All
> > > > sub-threads are producers which responsible for mm page processing, e.g.
> > > > compression. The main thread is the consumer which responsible for
> > > > writing the compressed data into file. page_flag_buf->ready is used to
> > > > sync main and sub-threads. When a sub-thread finishes page processing,
> > > > it will set ready flag to be FLAG_READY. In the meantime, main thread
> > > > looply check all threads of the ready flags, and break the loop when
> > > > find FLAG_READY.
> > >
> > > I've tried to reproduce the issue, but I couldn't on x86_64.
> >
> > Yes, I cannot reproduce it on x86_64 either, but the issue is very
> > easily reproduced on ppc64 arch, which is where our QE reported.
>
> Yes, this is expected. X86 implements a strongly ordered memory model,
> so a "store-to-memory" instruction ensures that the new value is
> immediately observed by other CPUs.
>
> FWIW the current code is wrong even on X86, because it does nothing to
> prevent compiler optimizations. The compiler is then allowed to reorder
> instructions so that the write to page_flag_buf->ready happens after
> other writes; with a bit of bad scheduling luck, the consumer thread
> may see an inconsistent state (e.g. read a stale page_flag_buf->pfn).
> Note that thanks to how compilers are designed (today), this issue is
> more or less hypothetical. Nevertheless, the use of atomics fixes it,
> because they also serve as memory barriers.

Thanks a lot for your detailed explanation, it's very helpful! I
haven't thought of the possibility of instruction reordering and
atomic_rw prevents the reorder.

Thanks,
Tao Liu

>
> Petr T
>