Bug report: kernel paniced when system hibernates

Alexandre Ghiti alexghiti at rivosinc.com
Wed May 17 07:55:53 PDT 2023


On Wed, May 17, 2023 at 1:28 PM Conor Dooley <conor.dooley at microchip.com> wrote:
>
> Hey Alex,
>
> On Wed, May 17, 2023 at 10:58:02AM +0200, Alexandre Ghiti wrote:
> > On Tue, May 16, 2023 at 1:12 PM Alexandre Ghiti <alexghiti at rivosinc.com> wrote:
>
> > > On Tue, May 16, 2023 at 11:24 AM Song Shuai <suagrfillet at gmail.com> wrote:
> > > I actually removed this flag a few years ago, and I have to admit that
> > > I need to check if that's necessary: the goal of commit 3335068f8721
> > > ("riscv: Use PUD/P4D/PGD pages for the linear mapping") is to expose
> > > the "right" start of DRAM so that we can align virtual and physical
> > > addresses on a 1GB boundary.
> > >
> > > So I have to check if a nomap region is actually added as a
> > > memblock.memory.regions[] or not: if yes, that's perfect, let's add
> > > the nomap attributes to the PMP regions, otherwise, I don't think that
> > > is a good solution.
> >
> > So here is the current linear mapping without nomap in openSBI:
> >
> > ---[ Linear mapping ]---
> > 0xff60000000000000-0xff60000000200000    0x0000000080000000         2M
> > PMD     D A G . . W R V
> > 0xff60000000200000-0xff60000000e00000    0x0000000080200000        12M
> > PMD     D A G . . . R V
> >
> > And below the linear mapping with nomap in openSBI:
> >
> > ---[ Linear mapping ]---
> > 0xff60000000080000-0xff60000000200000    0x0000000080080000      1536K
> > PTE     D A G . . W R V
> > 0xff60000000200000-0xff60000000e00000    0x0000000080200000        12M
> > PMD     D A G . . . R V
> >
> > So adding nomap does not misalign virtual and physical addresses, it
> > prevents the usage of 1GB page for this area though, so that's a
> > solution, we just lose this 1GB page here.
> >
> > But even though that may be the fix, I think we also need to fix that
> > in the kernel as it would break compatibility with certain versions of
> > openSBI *if* we fix openSBI...So here are a few solutions:
> >
> > 1. we can mark all "mmode_resv" nodes in the device tree as nomap,
> > before the linear mapping is established (IIUC, those nodes are added
> > by openSBI to advertise PMP regions)
> >     -> This amounts to the same fix as opensbi and we lose the 1GB hugepage.
>
> AFAIU, losing the 1 GB hugepage is a regression, which would make this
> not an option, right?

Not sure this is a real regression, I'd rather avoid it, but as
mentioned in my first answer, Mike Rapoport showed that it was making
no difference performance-wise...

>
> > 2. we can tweak pfn_is_nosave function to *not* save pfn corresponding
> > to PMP regions
> >     -> We don't lose the 1GB hugepage \o/
> > 3. we can use register_nosave_region() to not save the "mmode_resv"
> > regions (x86 does that
> > https://elixir.bootlin.com/linux/v6.4-rc1/source/arch/x86/kernel/e820.c#L753)
> >     -> We don't lose the 1GB hugepage \o/
> > 4. Given JeeHeng pointer to
> > https://elixir.bootlin.com/linux/v6.4-rc1/source/kernel/power/snapshot.c#L1340,
> > we can mark those pages as non-readable and make the hibernation
> > process not save those pages
> >     -> Very late-in-the-day idea, not sure what it's worth, we also
> > lose the 1GB hugepage...
>
> Ditto here re: introducing another regression.
>
> > To me, the best solution is 3 as it would prepare for other similar
> > issues later, it is similar to x86 and it allows us to keep 1GB
> > hugepages.
> >
> > I have been thinking, and to me nomap does not provide anything since
> > the kernel should not address this memory range, so if it does, we
> > must fix the kernel.
> >
> > Let me know what you all think, I'll be preparing a PoC of 3 in the meantime!
>
> #3 would probably get my vote too. It seems like you could use it
> dynamically if there was to be a future other provider of "mmode_resv"
> regions, rather than doing something location-specific.
>
> We should probably document these opensbi reserved memory nodes though
> in a dt-binding or w/e if we are going to be relying on them to not
> crash!

Yes, you're right, let's see what Atish and Anup think!

Thanks for your quick answers Conor and Song, really appreciated!

Alex

>
> Thanks for working on this,
> Conor.
>



More information about the linux-riscv mailing list