[PATCH 0/5 v11] KASan for Arm

Florian Fainelli f.fainelli at gmail.com
Wed Jul 1 00:53:22 EDT 2020



On 6/30/2020 2:41 PM, Ard Biesheuvel wrote:
> On Tue, 30 Jun 2020 at 15:39, Linus Walleij <linus.walleij at linaro.org> wrote:
>>
>> This is the v11 version of the KASan patches for ARM.
>>
>> The main changes from the v10 version is:
>>
>> - LPAE now compiles and works again, at least Versatile Express
>>   Cortex A15 TC1 in QEMU (which is the LPAE system I have
>>   access to).
>>
>> - Rewrite some of the page directory initialization after
>>   helpful feedback from Mike Rapoport and Russell King.
>>
>> Also minor improvements to commit messages and comments
>> in the code so it is clear (for most cases I hope) why
>> some ifdefs etc are there.
>>
>> All tested platforms from ARMv4 thru ARMv7 work fine. I
>> have not been able to re-test with the Qualcomm DragonBoard
>> APQ8060 yet, but I suspect the problem there is that the
>> DT parser code reaches out into non-kernel memory and
>> needs some de-instrumentation, possibly combined with the
>> memory holding the device tree getting corrupted or reused
>> before we have a chance to parse it.
>>
>> Abbott Liu (1):
>>   ARM: Define the virtual space of KASan's shadow region
>>
>> Andrey Ryabinin (3):
>>   ARM: Disable KASan instrumentation for some code
>>   ARM: Replace string mem* functions for KASan
>>   ARM: Enable KASan for ARM
>>
>> Linus Walleij (1):
>>   ARM: Initialize the mapping of KASan shadow memory
>>
> 
> Hi,
> 
> I needed the changes below to make this work on a 16 core GICv3
> QEMU/KVM vm with 8 GB of RAM
> 
> Without masking start, I get a strange error where kasan_alloc_block()
> runs out of memory, probably because one of the do..while stop
> conditions fails to trigger and we loop until we run out of lowmem.
> 
> The TLB flush is really essential to make any of these page table
> modifications take effect right away, and strange things can happen if
> you don't. I also saw a crash in the DT unflatten code without this
> change, but that is probably because it is simply the code that runs
> immediately after.
> 
> If you see anything like
> 
> Unable to handle kernel paging request at virtual address b744077c
> [b744077c] *pgd=80000040206003, *pmd=6abf5003, *pte=c000006abb471f
> 
> where the CPU faults on an address that appears to have a valid
> mapping at each level, it means that the page table walker was using a
> stale TLB entry to do the translation, triggered a fault and when we
> look at the page tables in software, everything looks like it is
> supposed to.

Thanks Ard, this allows me to boot successfully to a prompt on a BCM7278
system whereas we would have an error before while unflattening the DT.

Now there are still other systems that fail booting with the error log
attached previously, but it is not clear yet to me why this is happening
as it does not seem to depend on the memory ranges only as I initially
thought.
--
Florian



More information about the linux-arm-kernel mailing list