[PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area

Mon Feb 15 10:59:57 PST 2016

On Mon, Feb 15, 2016 at 05:28:02PM +0300, Andrey Ryabinin wrote:
> On 02/12/2016 07:06 PM, Catalin Marinas wrote:
> > So far, we have:
> > 
> > KASAN+for-next/kernmap goes wrong
> > KASAN+UBSAN goes wrong
> > 
> > Enabled individually, KASAN, UBSAN and for-next/kernmap seem fine. I may
> > have to trim for-next/core down until we figure out where the problem
> > is.
> > 
> > BUG: KASAN: stack-out-of-bounds in find_busiest_group+0x164/0x16a0 at addr ffffffc93665bc8c
> 
> Can it be related to TLB conflicts, which supposed to be fixed in
> "arm64: kasan: avoid TLB conflicts" patch from "arm64: mm: rework page
> table creation" series ?

I can very easily reproduce this with a vanilla 4.5-rc1 series by
enabling inline instrumentation (maybe Mark's theory is true w.r.t.
image size).

Some information, maybe you can shed some light on this. It seems to
happen only for secondary CPUs on the swapper stack (I think allocated
via fork_idle()). The code generated looks sane to me, so KASAN should
not complain but maybe there is some uninitialised shadow, hence the
error.

The report:

-------------->8---------------
BUG: KASAN: stack-out-of-bounds in clockevents_program_event+0x354/0x368 at addr ffffffc93651bca8
Read of size 8 by task swapper/1/0
page:ffffffbde6d946c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000000000000000()
page dumped because: kasan: bad access detected
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G    B           4.5.0-rc1+ #163
Hardware name: Juno (DT)
Call trace:
[<ffffffc00008f130>] dump_backtrace+0x0/0x358
[<ffffffc00008f49c>] show_stack+0x14/0x20
[<ffffffc000785dc0>] dump_stack+0xf8/0x188
[<ffffffc000343c0c>] kasan_report_error+0x524/0x550
[<ffffffc000343d50>] __asan_report_load8_noabort+0x40/0x48
[<ffffffc0001f2bc4>] clockevents_program_event+0x354/0x368
[<ffffffc0001f73d4>] tick_program_event+0xac/0x108
[<ffffffc0001d85c8>] hrtimer_start_range_ns+0x8a0/0xb20
[<ffffffc0001f8ba8>] __tick_nohz_idle_enter+0x970/0xca8
[<ffffffc0001f9368>] tick_nohz_idle_enter+0x60/0x98
[<ffffffc0001933ec>] cpu_startup_entry+0x14c/0x448
[<ffffffc000098654>] secondary_start_kernel+0x264/0x2e0
[<0000000080082ecc>] 0x80082ecc
Memory state around the buggy address:
 ffffffc93651bb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc93651bc00: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
>ffffffc93651bc80: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00
                                  ^
 ffffffc93651bd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc93651bd80: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4
-------------->8---------------

I put some printks in clockevents_program_event() and the SP is
0xffffffc93651bc70, so it matches the above.

Disassembling the code:

-------------->8---------------
ffffffc0001f2870 <clockevents_program_event>:
ffffffc0001f2870:       a9bc7bfd        stp     x29, x30, [sp,#-64]!
ffffffc0001f2874:       d2dff204        mov     x4, #0xff9000000000             // #280993940373504
ffffffc0001f2878:       910003fd        mov     x29, sp
ffffffc0001f287c:       910103a3        add     x3, x29, #0x40
ffffffc0001f2880:       f2fbffe4        movk    x4, #0xdfff, lsl #48
ffffffc0001f2884:       a90153f3        stp     x19, x20, [sp,#16]
ffffffc0001f2888:       a9025bf5        stp     x21, x22, [sp,#32]
ffffffc0001f288c:       f81f8c61        str     x1, [x3,#-8]!
ffffffc0001f2890:       aa0003f3        mov     x19, x0
ffffffc0001f2894:       53001c55        uxtb    w21, w2
ffffffc0001f2898:       d343fc60        lsr     x0, x3, #3
ffffffc0001f289c:       38e46800        ldrsb   w0, [x0,x4]
ffffffc0001f28a0:       350018e0        cbnz    w0, ffffffc0001f2bbc <clockevents_program_event+0x34c>

[...]

ffffffc0001f2bbc:       aa0303e0        mov     x0, x3
ffffffc0001f2bc0:       94054454        bl      ffffffc000343d10 <__asan_report_load8_noabort>
-------------->8---------------

To me, line ffffffc0001f288c looks like a normal store to a stack
variable and the stack boundaries look fine. The ffffffc0001f289c line
checks shadow and reads non-zero, hence the report. But I don't get
what's wrong with this function, other than corrupt KASAN shadow.

-- 
Catalin