[PATCH 0/5 v11] KASan for Arm

Ard Biesheuvel ardb at kernel.org
Wed Jul 1 08:09:23 EDT 2020


On Wed, 1 Jul 2020 at 10:32, Ard Biesheuvel <ardb at kernel.org> wrote:
>
> On Wed, 1 Jul 2020 at 09:43, Ard Biesheuvel <ardb at kernel.org> wrote:
> >
> > On Wed, 1 Jul 2020 at 06:53, Florian Fainelli <f.fainelli at gmail.com> wrote:
> > >
> > >
> > >
> > > On 6/30/2020 2:41 PM, Ard Biesheuvel wrote:
> > > > On Tue, 30 Jun 2020 at 15:39, Linus Walleij <linus.walleij at linaro.org> wrote:
> > > >>
> > > >> This is the v11 version of the KASan patches for ARM.
> > > >>
> > > >> The main changes from the v10 version is:
> > > >>
> > > >> - LPAE now compiles and works again, at least Versatile Express
> > > >>   Cortex A15 TC1 in QEMU (which is the LPAE system I have
> > > >>   access to).
> > > >>
> > > >> - Rewrite some of the page directory initialization after
> > > >>   helpful feedback from Mike Rapoport and Russell King.
> > > >>
> > > >> Also minor improvements to commit messages and comments
> > > >> in the code so it is clear (for most cases I hope) why
> > > >> some ifdefs etc are there.
> > > >>
> > > >> All tested platforms from ARMv4 thru ARMv7 work fine. I
> > > >> have not been able to re-test with the Qualcomm DragonBoard
> > > >> APQ8060 yet, but I suspect the problem there is that the
> > > >> DT parser code reaches out into non-kernel memory and
> > > >> needs some de-instrumentation, possibly combined with the
> > > >> memory holding the device tree getting corrupted or reused
> > > >> before we have a chance to parse it.
> > > >>
> > > >> Abbott Liu (1):
> > > >>   ARM: Define the virtual space of KASan's shadow region
> > > >>
> > > >> Andrey Ryabinin (3):
> > > >>   ARM: Disable KASan instrumentation for some code
> > > >>   ARM: Replace string mem* functions for KASan
> > > >>   ARM: Enable KASan for ARM
> > > >>
> > > >> Linus Walleij (1):
> > > >>   ARM: Initialize the mapping of KASan shadow memory
> > > >>
> > > >
> > > > Hi,
> > > >
> > > > I needed the changes below to make this work on a 16 core GICv3
> > > > QEMU/KVM vm with 8 GB of RAM
> > > >
> > > > Without masking start, I get a strange error where kasan_alloc_block()
> > > > runs out of memory, probably because one of the do..while stop
> > > > conditions fails to trigger and we loop until we run out of lowmem.
> > > >
> > > > The TLB flush is really essential to make any of these page table
> > > > modifications take effect right away, and strange things can happen if
> > > > you don't. I also saw a crash in the DT unflatten code without this
> > > > change, but that is probably because it is simply the code that runs
> > > > immediately after.
> > > >
> > > > If you see anything like
> > > >
> > > > Unable to handle kernel paging request at virtual address b744077c
> > > > [b744077c] *pgd=80000040206003, *pmd=6abf5003, *pte=c000006abb471f
> > > >
> > > > where the CPU faults on an address that appears to have a valid
> > > > mapping at each level, it means that the page table walker was using a
> > > > stale TLB entry to do the translation, triggered a fault and when we
> > > > look at the page tables in software, everything looks like it is
> > > > supposed to.
> > >
> > > Thanks Ard, this allows me to boot successfully to a prompt on a BCM7278
> > > system whereas we would have an error before while unflattening the DT.
> > >
> > > Now there are still other systems that fail booting with the error log
> > > attached previously, but it is not clear yet to me why this is happening
> > > as it does not seem to depend on the memory ranges only as I initially
> > > thought.
> >
> > It seems to me that for LPAE, we are not copying enough of the level 2
> > early shadow tables: a level 2 table covers 512 MB, which is exactly
> > the size of the KASAN shadow region for a 4 GB address space. However,
> > the shadow region is not 512 MB aligned, and so the early shadow
> > necessarily covers two level 2 tables. Could you try the following
> > please?
> >
> > diff --git a/arch/arm/mm/kasan_init.c b/arch/arm/mm/kasan_init.c
> > index 535dce42e59d..3c9c37a59b57 100644
> > --- a/arch/arm/mm/kasan_init.c
> > +++ b/arch/arm/mm/kasan_init.c
> > @@ -27,7 +27,7 @@
> >
> >  static pgd_t tmp_pgd_table[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
> >
> > -pmd_t tmp_pmd_table[PTRS_PER_PMD] __page_aligned_bss;
> > +static pmd_t tmp_pmd_table[2][PTRS_PER_PMD] __page_aligned_bss;
> >
> >  static __init void *kasan_alloc_block(size_t size, int node)
> >  {
> > @@ -231,13 +231,15 @@ void __init kasan_init(void)
> >          * to the proper swapper_pg_dir.
> >          */
> >         memcpy(tmp_pgd_table, swapper_pg_dir, sizeof(tmp_pgd_table));
> > -#ifdef CONFIG_ARM_LPAE
> > -       memcpy(tmp_pmd_table,
> > -              pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_START)),
> > -              sizeof(tmp_pmd_table));
> > -       set_pgd(&tmp_pgd_table[pgd_index(KASAN_SHADOW_START)],
> > -               __pgd(__pa(tmp_pmd_table) | PMD_TYPE_TABLE | L_PGD_SWAPPER));
> > -#endif
> > +       if (IS_ENABLED(CONFIG_ARM_LPAE)) {
> > +               memcpy(tmp_pmd_table,
> > +                      pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_START)),
> > +                      sizeof(tmp_pmd_table));
> > +               set_pgd(&tmp_pgd_table[pgd_index(KASAN_SHADOW_START)],
> > +                       __pgd(__pa(&tmp_pmd_table[0]) | PMD_TYPE_TABLE
> > | L_PGD_SWAPPER));
> > +               set_pgd(&tmp_pgd_table[pgd_index(KASAN_SHADOW_START) + 1],
> > +                       __pgd(__pa(&tmp_pmd_table[1]) | PMD_TYPE_TABLE
> > | L_PGD_SWAPPER));
> > +       }
> >         cpu_switch_mm(tmp_pgd_table, &init_mm);
> >         clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>
> Hum maybe not. KASAN_SHADOW_START != KASAN_SHADOW_OFFSET in this case,
> and AIUI, the kernel's KASAN shadow region is [b6e00000, bf000000),
> which falls entirely inside a naturally aligned 512 MB region. IOW,
> pgd_index(KASAN_SHADOW_START) == pgd_index(KASAN_SHADOW_END), and we
> should probably add a build_bug_on() there to ensure that this remains
> the case.
>
> However, your crash log does suggest that no early shadow mapping
> exists for address 0xecff0000 (shadowed at 0xbc9ffe00), and I suspect
> that the clear_pgds() call is implicated in this, but I am not sure
> how exactly.
>
> In any case, I found another issue:
>
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000040000000-0x00000000ffab1fff]
> [    0.000000]   node   0: [mem 0x00000000ffab2000-0x00000000ffc73fff]
> [    0.000000]   node   0: [mem 0x00000000ffc74000-0x00000000ffffffff]
> ...
> [    0.000000] kasan: populating shadow for b7000000, bd000000
> [    0.000000] kasan: populating shadow for aef56400, bd000000
> [    0.000000] kasan: populating shadow for aef8e800, bd000000
> [    0.000000] kasan: populating shadow for b6f00000, b7000000
>
> i.e., the two highmem ranges are not disregarded as they should, due
> to the bogus __va() translations and the truncation by the (void *)
> casts.
>
>
> Fix below:


I pushed these changes and a few more to

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm-kasan-v11



More information about the linux-arm-kernel mailing list