RISC-V Linux kernel not booting up with KASAN enabled

Alexandre Ghiti alex at ghiti.fr
Fri Mar 3 06:16:38 PST 2023


On 3/3/23 06:44, Dmitry Vyukov wrote:
> On Thu, 2 Mar 2023 at 21:11, Alexandre Ghiti <alex at ghiti.fr> wrote:
>> +cc Dmitry and kasan-dev, in case they know about this but I did not
>> find anything related
> Hard to say anything w/o commit/symbolized report.
> If it's stack unwinder and it's supposed to be precise, then it may be
> a bug in the unwinder where it reads a wrong location and is imprecise
> (not the frame pointer).
> If it's supposed to be imprecise, then it should use READ_ONCE_NOCHECK
> to read random stack locations.


Please correct me if I say something obviously wrong.

The config used to generate this trace does not set 
CONFIG_FRAME_POINTER: we were then in an imprecise stack unwinding mode. 
When set, the backtrace disappears: so IIUC, the issue lies in the stack 
unwinding function that reads the stack randomly and KASAN does not like 
that. So as you suggested, I used READ_ONCE_NOCHECK when reading the 
stack and the backtrace also disappears. So the following patch would be 
the fix for this, is that correct?


diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
index f9a5a7c90ff0..64a9c093aef9 100644
--- a/arch/riscv/kernel/stacktrace.c
+++ b/arch/riscv/kernel/stacktrace.c
@@ -101,7 +101,7 @@ void notrace walk_stackframe(struct task_struct *task,
         while (!kstack_end(ksp)) {
                 if (__kernel_text_address(pc) && unlikely(!fn(arg, pc)))
                         break;
-               pc = (*ksp++) - 0x4;
+               pc = READ_ONCE_NOCHECK(*ksp++) - 0x4;
         }
  }

Thanks for your quick answer,

Alex


>
>> On 3/2/23 19:01, Chathura Rajapaksha wrote:
>>> Hi Alex/All,
>>>
>>> Kernel is booting now but I get the following KASAN failure in the
>>> bootup itself.
>>> I didn't see this bug was reported before anywhere.
>>>
>>> [    0.000000] Memory: 63436K/129024K available (20385K kernel code,
>>> 7120K rwdata, 4096K rodata, 2138K init, 476K bss, 65588K reserved, 0K
>>> cma-reserved)
>>> [    0.000000] ==================================================================
>>> [    0.000000] BUG: KASAN: stack-out-of-bounds in walk_stackframe+0x1b2/0x1e2
>>> [    0.000000] Read of size 8 at addr ffffffff81e07c40 by task swapper/0
>>> [    0.000000]
>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
>>> 6.2.0-gae3419fbac84-dirty #7
>>> [    0.000000] Hardware name: riscv-virtio,qemu (DT)
>>> [    0.000000] Call Trace:
>>> [    0.000000] [<ffffffff8000ab9e>] walk_stackframe+0x0/0x1e2
>>> [    0.000000] [<ffffffff80108508>] init_param_lock+0x26/0x2a
>>> [    0.000000] [<ffffffff8000ad4c>] walk_stackframe+0x1ae/0x1e2
>>> [    0.000000] [<ffffffff813d86e0>] dump_stack_lvl+0x22/0x36
>>> [    0.000000] [<ffffffff813bd17a>] print_report+0x198/0x4a8
>>> [    0.000000] [<ffffffff80108508>] init_param_lock+0x26/0x2a
>>> [    0.000000] [<ffffffff8000ad4c>] walk_stackframe+0x1ae/0x1e2
>>> [    0.000000] [<ffffffff8023bd52>] kasan_report+0x9a/0xc8
>>> [    0.000000] [<ffffffff8000ad4c>] walk_stackframe+0x1ae/0x1e2
>>> [    0.000000] [<ffffffff8000ad4c>] walk_stackframe+0x1ae/0x1e2
>>> [    0.000000] [<ffffffff80108748>] stack_trace_save+0x88/0xa6
>>> [    0.000000] [<ffffffff801086bc>] filter_irq_stacks+0x8a/0x8e
>>> [    0.000000] [<ffffffff800b65e2>] devkmsg_read+0x3f8/0x3fc
>>> [    0.000000] [<ffffffff8023b2de>] kasan_save_stack+0x2c/0x56
>>> [    0.000000] [<ffffffff80108744>] stack_trace_save+0x84/0xa6
>>> [    0.000000] [<ffffffff8023b31a>] kasan_set_track+0x12/0x20
>>> [    0.000000] [<ffffffff8023b8f6>] __kasan_slab_alloc+0x58/0x5e
>>> [    0.000000] [<ffffffff8023aeae>] __kmem_cache_create+0x21e/0x39a
>>> [    0.000000] [<ffffffff8141623e>] create_boot_cache+0x70/0x9c
>>> [    0.000000] [<ffffffff8141b5f6>] kmem_cache_init+0x6c/0x11e
>>> [    0.000000] [<ffffffff8140125a>] mm_init+0xd8/0xfe
>>> [    0.000000] [<ffffffff8140145c>] start_kernel+0x190/0x3ca
>>> [    0.000000]
>>> [    0.000000] The buggy address belongs to stack of task swapper/0
>>> [    0.000000]  and is located at offset 0 in frame:
>>> [    0.000000]  stack_trace_save+0x0/0xa6
>>> [    0.000000]
>>> [    0.000000] This frame has 1 object:
>>> [    0.000000]  [32, 56) 'c'
>>> [    0.000000]
>>> [    0.000000] The buggy address belongs to the physical page:
>>> [    0.000000] page:(____ptrval____) refcount:1 mapcount:0
>>> mapping:0000000000000000 index:0x0 pfn:0x82007
>>> [    0.000000] flags: 0x1000(reserved|zone=0)
>>> [    0.000000] raw: 0000000000001000 ff60000007ca5090 ff60000007ca5090
>>> 0000000000000000
>>> [    0.000000] raw: 0000000000000000 0000000000000000 00000001ffffffff
>>> [    0.000000] page dumped because: kasan: bad access detected
>>> [    0.000000]
>>> [    0.000000] Memory state around the buggy address:
>>> [    0.000000]  ffffffff81e07b00: 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00
>>> [    0.000000]  ffffffff81e07b80: 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00
>>> [    0.000000] >ffffffff81e07c00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1
>>> 00 00 00 f3
>>> [    0.000000]                                            ^
>>> [    0.000000]  ffffffff81e07c80: f3 f3 f3 f3 00 00 00 00 00 00 00 00
>>> 00 00 00 00
>>> [    0.000000]  ffffffff81e07d00: 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00
>>> [    0.000000] ==================================================================
>>
>> I was able to reproduce the exact same trace, I'll debug that tomorrow,
>> I hope it is a real bug :)
>>
>> Thanks for the report Chatura,
>>
>> Alex
>>
>>
>>> Best,
>>> Chath
>>>
>>> On Thu, Mar 2, 2023 at 11:25 AM Chathura Rajapaksha
>>> <chathura.abeyrathne.lk at gmail.com> wrote:
>>>> Hi Alex,
>>>>
>>>> Thank you very much, kernel booted up with the patches you mentioned.
>>>> Bootup was pretty slow compared to before though (on a dev board).
>>>> I guess that is kind of expected with KASAN enabled.
>>>> Thanks again.
>>>>
>>>> Regards,
>>>> Chath
>>>>
>>>> On Thu, Mar 2, 2023 at 2:50 AM Alexandre Ghiti <alex at ghiti.fr> wrote:
>>>>> Hi Chatura,
>>>>>
>>>>> On 3/2/23 04:13, Chathura Rajapaksha wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I observed that RISC-V Linux hangs when I enable KASAN.
>>>>>> Without KASAN it works fine with QEMU.
>>>>>> I am using the commit ae3419fbac845b4d3f3a9fae4cc80c68d82cdf6e
>>>>>>
>>>>>> When KASAN is enabled, QEMU hangs after OpenSBI prints.
>>>>>>
>>>>>> I noticed a similar issue was reported before in
>>>>>> https://lore.kernel.org/lkml/CACT4Y+ZmuOpyf_0vHTT4t3wkmJuW8Ezvcg7v6yDVd8YOViS=GA@mail.gmail.com/t/
>>>>>> But I believe I have the patch mentioned in that thread.
>>>>> I proposed a series that will be included in 6.3 regarding KASAN issues
>>>>> here: https://patchwork.kernel.org/project/linux-riscv/list/?series=718458
>>>>>
>>>>> Can you give it a try and tell me if it works better?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Alex
>>>>>
>>>>>
>>>>>> My kernel config:
>>>>>> https://drive.google.com/file/d/1j9nU7f9MxCc_i-UHUCTvo7o6nDrcUz0w/view?usp=sharing
>>>>>>
>>>>>> Best regards,
>>>>>> Chath
>>>>>>
>>>>>> _______________________________________________
>>>>>> linux-riscv mailing list
>>>>>> linux-riscv at lists.infradead.org
>>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv



More information about the linux-riscv mailing list