[PATCH] percpu: km: ensure it is used with NOMMU (either UP or SMP)

Dennis Zhou dennis at kernel.org
Tue Dec 14 09:26:16 PST 2021


Hello,

On Tue, Dec 14, 2021 at 05:29:22PM +0100, Geert Uytterhoeven wrote:
> Hi Vladimir and Dennis,
> 
> On Wed, Dec 1, 2021 at 12:53 PM Vladimir Murzin <vladimir.murzin at arm.com> wrote:
> > On 11/30/21 5:41 PM, Dennis Zhou wrote:
> > > On Tue, Nov 30, 2021 at 05:29:54PM +0000, Vladimir Murzin wrote:
> > >> Currently, NOMMU pull km allocator via !SMP dependency because most of
> > >> them are UP, yet for SMP+NOMMU vm allocator gets pulled which:
> > >>
> > >> * may lead to broken build [1]
> > >> * ...or not working runtime due to [2]
> > >>
> > >> It looks like SMP+NOMMU case was overlooked in bbddff054587 ("percpu:
> > >> use percpu allocator on UP too") so restore that.
> > >>
> > >> [1]
> > >> For ARM SMP+NOMMU (R-class cores)
> > >>
> > >> arm-none-linux-gnueabihf-ld: mm/percpu.o: in function `pcpu_post_unmap_tlb_flush':
> > >> mm/percpu-vm.c:188: undefined reference to `flush_tlb_kernel_range'
> > >>
> > >> [2]
> > >> static inline
> > >> int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> > >>                 pgprot_t prot, struct page **pages, unsigned int page_shift)
> > >> {
> > >>        return -EINVAL;
> > >> }
> > >>
> > >> Signed-off-by: Vladimir Murzin <vladimir.murzin at arm.com>
> > >> ---
> > >>  mm/Kconfig | 3 +--
> > >>  1 file changed, 1 insertion(+), 2 deletions(-)
> > >>
> > >> diff --git a/mm/Kconfig b/mm/Kconfig
> > >> index d16ba92..66331e0 100644
> > >> --- a/mm/Kconfig
> > >> +++ b/mm/Kconfig
> > >> @@ -425,9 +425,8 @@ config THP_SWAP
> > >>  # UP and nommu archs use km based percpu allocator
> > >>  #
> > >>  config NEED_PER_CPU_KM
> > >> -    depends on !SMP
> > >>      bool
> > >> -    default y
> > >> +    default !SMP || !MMU
> > >>
> > >
> > > Should this be `depends on !SMP || !MMU` with default yes? Because with
> > > SMP && MMU, it shouldn't be an option to run with percpu-km.
> >
> > IIUC these are equivalent, truth table would not change if is under "depends"
> > or "default"
> >
> > SMP    MMU   NEED_PER_CPU_KM
> >  y      y    !y || !y => n || n => n
> >  y      n    !y || !n => n || y => y
> >  n      y    !n || !y => y || n => y
> >  n      n    !n || !n => y || y => y
> >
> > >
> > >>  config CLEANCACHE
> > >>      bool "Enable cleancache driver to cache clean pages if tmem is present"
> > >> --
> > >> 2.7.4
> > >>
> > >
> > > It's interesting to me that this is all coming up at once. Earlier this
> > > month I had the same conversation with people involved with sh [1].
> > >
> > > [1] https://lore.kernel.org/linux-sh/YY7tp5attRyK42Zk@fedora/
> > >
> > > I can pull this shortly once I see whatever happened to linux-sh.
> >
> > Ahh, good to know! Adding SH folks here (start of discussion [0]). I see you came
> > to the same conclusion, right?
> >
> > IIRC, RISC-V also have SMP+NOMMU, so adding them as well.
> 
> I had seen the j-Core thread, but completely forgot about
> Canaan K210 (RV64 SMP+NOMMU).
> 
> This became commit 3583521aabac76e5 ("percpu: km: ensure it is used
> with NOMMU (either UP or SMP)").  And now booting K210 prints:
> 
>     percpu: wasting 10 pages per chunk
> 
> a) Is this bad?

It's not great.. Can you share the line on boot with the following
prefix: pcpu-alloc [1].

I'm a little surprised here because this means it's allocating not
against the right atomic size. I don't necesarily think it's an issue of
switching from percpu-vm to percpu-km.

> b) What exactly was this fixing, and how would I trigger the bad case
>    on K210 before, if it was affected at all?
> 

In v5.14, I merged Roman's request for percpu depopulation [2]. This
required calls to flush the tlb. There is an abstraction layer:
percpu-vm vs percpu-km. So if an architecture is using percpu-vm but
doesn't have an MMU AND doesn't map out appropriately the tlb flush
call, then it fails. This happened on arm + sh architectures. Now RV
might be mapping it out appropriately so they never saw the issue.

Now, there is also a bigger caveat with using percpu-vm without an MMU.
In percpu-vm, we allocate pages on demand and map them in into
pre-allocated vmas. This means there are 2 scenarios that I haven't
looked into deeper. 1, the vma alloc maps to allocating physical pages.
2, we're lucky percpu allocates backwards in the vma so we haven't had a
collision problem yet.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git/tree/mm/percpu.c#n2507
[2] https://lore.kernel.org/lkml/20210419225047.3415425-1-dennis@kernel.org/

Thanks,
Dennis

> >
> > [0] https://lore.kernel.org/linux-mm/20211130172954.129587-1-vladimir.murzin@arm.com/T/
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds



More information about the linux-riscv mailing list