[PATCH] Test for riscv fixes

Fri Oct 6 04:38:05 PDT 2023

Hi Mark,

On 02/10/2023 15:41, Mark Rutland wrote:
> On Mon, Oct 02, 2023 at 09:13:52AM +0200, Alexandre Ghiti wrote:
>> Hi Edward,
>>
>> On Sat, Sep 30, 2023 at 1:06 AM Edward AD<twuufnxlz at gmail.com>  wrote:
>>> Hi Alexandre,
>>>
>>> On Fri, 29 Sep 2023 10:25:59 +0200 Alexandre Ghiti<alexghiti at rivosinc.com>  wrote:
>>>> I'm still not convinced this will fix the kasan out-of-bounds
>>>> accesses, the page can be valid but the read can happen at an offset
>>>> not initialized and trigger such errors right? I still think there is
>>>> something weird about the stack frame, as to me this should not happen
>>>> (but admittedly I don't know much about that).
>>> The added check can confirm that the physical page is invalid (whether it is a
>>> vmalloc allocated page or a slab allocated page), and exit the for loop when it is invalid.
>> Yes, but to me this is not what happens in the bug report you link:
>>
>> | BUG: KASAN: out-of-bounds in walk_stackframe+0x130/0x2f2
>> arch/riscv/kernel/stacktrace.c:59
>> | Read of size 8 at addr ff20000006d37c38 by task swapper/1/0
>>
>> So the read at address ff20000006d37c38 is not "normal" according to
>> KASAN (you can see there is no trap, meaning the physical mapping
>> exists).
>>
>> | The buggy address belongs to the virtual mapping at
>> |  [ff20000006d30000, ff20000006d39000) created by:
>> | kernel_clone+0x118/0x896 kernel/fork.c:2909
>>
>> The virtual address is legitimate since the vma exists ^
>>
>> | The buggy address belongs to the physical page:
>> | page:ff1c00000250dbc0 refcount:1 mapcount:0 mapping:0000000000000000
>> index:0x0 pfn:0x9436f
>>
>> And the physical page also exists ^
>>
>> So I insist, checking that a physical mapping exists to exit the loop
>> is not enough, to me, the error here is that the backtrace goes "too
>> far" at an address where nothing was written before and then KASAN
>> complains about that, again, we don't take any page fault here so it's
>> not a problem of existing physical mapping.
> Yep!
>
> I believe what's happening here is one task unwinding another (starting from
> whatever gets saved in switch_to()), and there's nothing that prevents that
> other task from running concurrently and modifying/poisoning its stack. In
> general trying to unwind a remote stack is racy and broken, but we're stuck
> with a few bits of the kernel tryingto do that occasionally and so the arch
> code needs to handle that without blowing up.

Thanks for that, I had already fixed the "imprecise" unwinder (when we 
don't have a frame pointer) using READ_ONCE_NOCHECK() but I had not this 
use case in mind, so I'll fix that too.

> For KASAN specifically you'll need to access the stack with unchecked accesses
> (e.g. using READ_ONCE_NOCHECK() to read the struct stackframe), and you'll
> probably want to add some explicit checks that pointers are within stack bounds
> since concurrent modification (or corruption) could result in entirely bogus
> pointers.
>
> I *think* that we do the right thing on arm64, so you might want to take a look
> at arm64's unwinder in arch/arm64/kernel/stacktrace.c,
> arch/arm64/include/asm/stacktrace.h, and
> arch/arm64/include/asm/stacktrace/common.h.

And I'll check that for the stack bounds check.

Thanks again,

Alex

>
> Mark.