[PATCH 0/5 v11] KASan for Arm
Ard Biesheuvel
ardb at kernel.org
Wed Jul 1 08:09:23 EDT 2020
On Wed, 1 Jul 2020 at 10:32, Ard Biesheuvel <ardb at kernel.org> wrote:
>
> On Wed, 1 Jul 2020 at 09:43, Ard Biesheuvel <ardb at kernel.org> wrote:
> >
> > On Wed, 1 Jul 2020 at 06:53, Florian Fainelli <f.fainelli at gmail.com> wrote:
> > >
> > >
> > >
> > > On 6/30/2020 2:41 PM, Ard Biesheuvel wrote:
> > > > On Tue, 30 Jun 2020 at 15:39, Linus Walleij <linus.walleij at linaro.org> wrote:
> > > >>
> > > >> This is the v11 version of the KASan patches for ARM.
> > > >>
> > > >> The main changes from the v10 version is:
> > > >>
> > > >> - LPAE now compiles and works again, at least Versatile Express
> > > >> Cortex A15 TC1 in QEMU (which is the LPAE system I have
> > > >> access to).
> > > >>
> > > >> - Rewrite some of the page directory initialization after
> > > >> helpful feedback from Mike Rapoport and Russell King.
> > > >>
> > > >> Also minor improvements to commit messages and comments
> > > >> in the code so it is clear (for most cases I hope) why
> > > >> some ifdefs etc are there.
> > > >>
> > > >> All tested platforms from ARMv4 thru ARMv7 work fine. I
> > > >> have not been able to re-test with the Qualcomm DragonBoard
> > > >> APQ8060 yet, but I suspect the problem there is that the
> > > >> DT parser code reaches out into non-kernel memory and
> > > >> needs some de-instrumentation, possibly combined with the
> > > >> memory holding the device tree getting corrupted or reused
> > > >> before we have a chance to parse it.
> > > >>
> > > >> Abbott Liu (1):
> > > >> ARM: Define the virtual space of KASan's shadow region
> > > >>
> > > >> Andrey Ryabinin (3):
> > > >> ARM: Disable KASan instrumentation for some code
> > > >> ARM: Replace string mem* functions for KASan
> > > >> ARM: Enable KASan for ARM
> > > >>
> > > >> Linus Walleij (1):
> > > >> ARM: Initialize the mapping of KASan shadow memory
> > > >>
> > > >
> > > > Hi,
> > > >
> > > > I needed the changes below to make this work on a 16 core GICv3
> > > > QEMU/KVM vm with 8 GB of RAM
> > > >
> > > > Without masking start, I get a strange error where kasan_alloc_block()
> > > > runs out of memory, probably because one of the do..while stop
> > > > conditions fails to trigger and we loop until we run out of lowmem.
> > > >
> > > > The TLB flush is really essential to make any of these page table
> > > > modifications take effect right away, and strange things can happen if
> > > > you don't. I also saw a crash in the DT unflatten code without this
> > > > change, but that is probably because it is simply the code that runs
> > > > immediately after.
> > > >
> > > > If you see anything like
> > > >
> > > > Unable to handle kernel paging request at virtual address b744077c
> > > > [b744077c] *pgd=80000040206003, *pmd=6abf5003, *pte=c000006abb471f
> > > >
> > > > where the CPU faults on an address that appears to have a valid
> > > > mapping at each level, it means that the page table walker was using a
> > > > stale TLB entry to do the translation, triggered a fault and when we
> > > > look at the page tables in software, everything looks like it is
> > > > supposed to.
> > >
> > > Thanks Ard, this allows me to boot successfully to a prompt on a BCM7278
> > > system whereas we would have an error before while unflattening the DT.
> > >
> > > Now there are still other systems that fail booting with the error log
> > > attached previously, but it is not clear yet to me why this is happening
> > > as it does not seem to depend on the memory ranges only as I initially
> > > thought.
> >
> > It seems to me that for LPAE, we are not copying enough of the level 2
> > early shadow tables: a level 2 table covers 512 MB, which is exactly
> > the size of the KASAN shadow region for a 4 GB address space. However,
> > the shadow region is not 512 MB aligned, and so the early shadow
> > necessarily covers two level 2 tables. Could you try the following
> > please?
> >
> > diff --git a/arch/arm/mm/kasan_init.c b/arch/arm/mm/kasan_init.c
> > index 535dce42e59d..3c9c37a59b57 100644
> > --- a/arch/arm/mm/kasan_init.c
> > +++ b/arch/arm/mm/kasan_init.c
> > @@ -27,7 +27,7 @@
> >
> > static pgd_t tmp_pgd_table[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
> >
> > -pmd_t tmp_pmd_table[PTRS_PER_PMD] __page_aligned_bss;
> > +static pmd_t tmp_pmd_table[2][PTRS_PER_PMD] __page_aligned_bss;
> >
> > static __init void *kasan_alloc_block(size_t size, int node)
> > {
> > @@ -231,13 +231,15 @@ void __init kasan_init(void)
> > * to the proper swapper_pg_dir.
> > */
> > memcpy(tmp_pgd_table, swapper_pg_dir, sizeof(tmp_pgd_table));
> > -#ifdef CONFIG_ARM_LPAE
> > - memcpy(tmp_pmd_table,
> > - pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_START)),
> > - sizeof(tmp_pmd_table));
> > - set_pgd(&tmp_pgd_table[pgd_index(KASAN_SHADOW_START)],
> > - __pgd(__pa(tmp_pmd_table) | PMD_TYPE_TABLE | L_PGD_SWAPPER));
> > -#endif
> > + if (IS_ENABLED(CONFIG_ARM_LPAE)) {
> > + memcpy(tmp_pmd_table,
> > + pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_START)),
> > + sizeof(tmp_pmd_table));
> > + set_pgd(&tmp_pgd_table[pgd_index(KASAN_SHADOW_START)],
> > + __pgd(__pa(&tmp_pmd_table[0]) | PMD_TYPE_TABLE
> > | L_PGD_SWAPPER));
> > + set_pgd(&tmp_pgd_table[pgd_index(KASAN_SHADOW_START) + 1],
> > + __pgd(__pa(&tmp_pmd_table[1]) | PMD_TYPE_TABLE
> > | L_PGD_SWAPPER));
> > + }
> > cpu_switch_mm(tmp_pgd_table, &init_mm);
> > clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>
> Hum maybe not. KASAN_SHADOW_START != KASAN_SHADOW_OFFSET in this case,
> and AIUI, the kernel's KASAN shadow region is [b6e00000, bf000000),
> which falls entirely inside a naturally aligned 512 MB region. IOW,
> pgd_index(KASAN_SHADOW_START) == pgd_index(KASAN_SHADOW_END), and we
> should probably add a build_bug_on() there to ensure that this remains
> the case.
>
> However, your crash log does suggest that no early shadow mapping
> exists for address 0xecff0000 (shadowed at 0xbc9ffe00), and I suspect
> that the clear_pgds() call is implicated in this, but I am not sure
> how exactly.
>
> In any case, I found another issue:
>
> [ 0.000000] Early memory node ranges
> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000ffab1fff]
> [ 0.000000] node 0: [mem 0x00000000ffab2000-0x00000000ffc73fff]
> [ 0.000000] node 0: [mem 0x00000000ffc74000-0x00000000ffffffff]
> ...
> [ 0.000000] kasan: populating shadow for b7000000, bd000000
> [ 0.000000] kasan: populating shadow for aef56400, bd000000
> [ 0.000000] kasan: populating shadow for aef8e800, bd000000
> [ 0.000000] kasan: populating shadow for b6f00000, b7000000
>
> i.e., the two highmem ranges are not disregarded as they should, due
> to the bogus __va() translations and the truncation by the (void *)
> casts.
>
>
> Fix below:
I pushed these changes and a few more to
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm-kasan-v11
More information about the linux-arm-kernel
mailing list