[PATCH] arm64: Print provenance information for addresses in registers

Mark Rutland mark.rutland at arm.com
Mon Feb 5 04:55:53 PST 2024


On Thu, Feb 01, 2024 at 04:38:11PM +0100, Stefan Wiehler wrote:
> Like commit 5aa6b70ed182 ("arm: print alloc free paths for address in
> registers") already does for 32-bit ARM, print available provenance information
> for addresses in registers in case of internal error as a debugging aid:
> 
>   Unable to handle kernel write to read-only memory at virtual address ffffffc00117cd5c
>   ...
>   x2 : 0000000000001b98 x1 : 0000000000000000 x0 : ffffffc00117cd5c
>   ...
>   Register x0 information: 14-page vmalloc region starting at 0xffffffc00117a000 allocated at load_module+0x15c4/0x2bc0
>   Register x1 information: NULL pointer
>   Register x2 information: non-paged memory
>   ...

Regardless of what 32-bit ARM does, I don't think it makes sense to dump this
for all the GPRs. It's incredibly verbose and not all that helpful, e.g.
testing with the LKDTM WRITE_RO test I get:

| # echo WRITE_RO > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry WRITE_RO
| lkdtm: attempting bad rodata write at ffffadee958c1bf0
| Unable to handle kernel write to read-only memory at virtual address ffffadee958c1bf0
| Mem abort info:
|   ESR = 0x000000009600004e
|   EC = 0x25: DABT (current EL), IL = 32 bits
|   SET = 0, FnV = 0
|   EA = 0, S1PTW = 0
|   FSC = 0x0e: level 2 permission fault
| Data abort info:
|   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
|   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
|   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
| swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041d6d000
| [ffffadee958c1bf0] pgd=100000043ffff803, p4d=100000043ffff803, pud=100000043fffe803, pmd=0060000041600f81
| Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
| Modules linked in:
| CPU: 4 PID: 147 Comm: sh Not tainted 6.8.0-rc3-00001-gc66454d88811 #1
| Hardware name: linux,dummy-virt (DT)
| pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
| pc : lkdtm_WRITE_RO+0x3c/0x54
| lr : lkdtm_WRITE_RO+0x24/0x54
| sp : ffff800080913c80
| x29: ffff800080913c80 x28: ffff0000c09ca200 x27: 0000000000000000
| x26: 0000000000000000 x25: 0000000000000000 x24: 0000aaaaee418930
| x23: ffff0000c0645780 x22: ffff800080913df0 x21: ffffadee96c957a0
| x20: 0000000000000009 x19: ffffadee958c1000 x18: 0000000000000006
| x17: 0000000000000000 x16: 0000000000000000 x15: ffff800080913720
| x14: 0000000000000000 x13: ffffadee969310b0 x12: 0000000000000432
| x11: 0000000000000166 x10: ffffadee969890b0 x9 : ffffadee969310b0
| x8 : 00000000ffffefff x7 : ffffadee969890b0 x6 : 80000000fffff000
| x5 : ffff0003fdfb5cc8 x4 : 0000000000000000 x3 : 0000000000000000
| x2 : 00000000abcd1234 x1 : 000000000198b861 x0 : ffffadee95e334a0
| Call trace:
|  lkdtm_WRITE_RO+0x3c/0x54
|  lkdtm_do_action+0x1c/0x30
|  direct_entry+0xbc/0x144
|  full_proxy_write+0x60/0xb4
|  vfs_write+0xd0/0x35c
|  ksys_write+0x70/0x104
|  __arm64_sys_write+0x1c/0x28
|  invoke_syscall+0x48/0x110
|  el0_svc_common.constprop.0+0x40/0xe0
|  do_el0_svc+0x1c/0x28
|  el0_svc+0x34/0xb4
|  el0t_64_sync_handler+0x120/0x12c
|  el0t_64_sync+0x190/0x194
| Register x0 information: 0-page vmalloc region starting at 0xffffadee95470000 allocated at paging_init+0x148/0x6f4
| Register x1 information: non-paged memory
| Register x2 information: non-paged memory
| Register x3 information: NULL pointer
| Register x4 information: NULL pointer
| Register x5 information: non-slab/vmalloc memory
| Register x6 information: non-paged memory
| Register x7 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
| Register x8 information: non-paged memory
| Register x9 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
| Register x10 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
| Register x11 information: non-paged memory
| Register x12 information: non-paged memory
| Register x13 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
| Register x14 information: NULL pointer
| Register x15 information: 4-page vmalloc region starting at 0xffff800080910000 allocated at kernel_clone+0x64/0x360
| Register x16 information: NULL pointer
| Register x17 information: NULL pointer
| Register x18 information: non-paged memory
| Register x19 information: 0-page vmalloc region starting at 0xffffadee95470000 allocated at paging_init+0x148/0x6f4
| Register x20 information: non-paged memory
| Register x21 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
| Register x22 information: 4-page vmalloc region starting at 0xffff800080910000 allocated at kernel_clone+0x64/0x360
| Register x23 information: slab dentry start ffff0000c0645780 pointer offset 0 size 192
| Register x24 information: non-paged memory
| Register x25 information: NULL pointer
| Register x26 information: NULL pointer
| Register x27 information: NULL pointer
| Register x28 information: slab task_struct start ffff0000c09ca200 pointer offset 0 size 4352
| Register x29 information: 4-page vmalloc region starting at 0xffff800080910000 allocated at kernel_clone+0x64/0x360
| Code: f2b579a2 900089c0 ca020021 91128000 (f905fa61)
| ---[ end trace 0000000000000000 ]---

Looking at the dump, the vast majority of those "Register X?? information:"
lines are irrelevant, and it's difficult to see the wood for the trees.

Why do we need this for (almost) all registers, rather than just the faulting
address?

In die_kernel_fault() we already print some information regarding the psecific
faulting address:

| static void die_kernel_fault(const char *msg, unsigned long addr,
|                              unsigned long esr, struct pt_regs *regs)
| {
|         bust_spinlocks(1);
| 
|         pr_alert("Unable to handle kernel %s at virtual address %016lx\n", msg,
|                  addr);
| 
|         kasan_non_canonical_hook(addr);
| 
|         mem_abort_decode(esr);
| 
|         show_pte(addr);
|         die("Oops", regs, esr);
|         bust_spinlocks(0);
|         make_task_dead(SIGKILL);
| }

... and IMO it'd make more sense to call mem_dump_obj() there, e.g.

| static void die_kernel_fault(const char *msg, unsigned long addr,
|                              unsigned long esr, struct pt_regs *regs)
| {
|         bust_spinlocks(1);
| 
|         pr_alert("Unable to handle kernel %s at virtual address %016lx\n", msg,
|                  addr);
| 
|         pr_alert("Fauling address:");
|         mem_dump_obj((void *)addr);
|         kasan_non_canonical_hook(addr);
| 
|         mem_abort_decode(esr);
| 
|         show_pte(addr);
|         die("Oops", regs, esr);
|         bust_spinlocks(0);
|         make_task_dead(SIGKILL);
| }

... which produces significantly more legible output, e.g.

| # echo WRITE_RO > /sys/kernel/debug/provoke-crash/DIRECT 
| lkdtm: Performing direct entry WRITE_RO
| lkdtm: attempting bad rodata write at ffffa66f5e2c1bf0
| Unable to handle kernel write to read-only memory at virtual address ffffa66f5e2c1bf0
| Fauling address: 0-page vmalloc region starting at 0xffffa66f5de70000 allocated at paging_init+0x148/0x6f4
| Mem abort info:
|   ESR = 0x000000009600004e
|   EC = 0x25: DABT (current EL), IL = 32 bits
|   SET = 0, FnV = 0
|   EA = 0, S1PTW = 0
|   FSC = 0x0e: level 2 permission fault
| Data abort info:
|   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
|   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
|   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
| swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041d6d000
| [ffffa66f5e2c1bf0] pgd=100000043ffff803, p4d=100000043ffff803, pud=100000043fffe803, pmd=0060000041600f81
| Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
| Modules linked in:
| CPU: 3 PID: 148 Comm: sh Not tainted 6.8.0-rc3-00002-gf95e3bd06270-dirty #2
| Hardware name: linux,dummy-virt (DT)
| pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
| pc : lkdtm_WRITE_RO+0x3c/0x54
| lr : lkdtm_WRITE_RO+0x24/0x54
| sp : ffff80008098bc80
| x29: ffff80008098bc80 x28: ffff0000c144a200 x27: 0000000000000000
| x26: 0000000000000000 x25: 0000000000000000 x24: 0000aaab235e21d0
| x23: ffff0000c06459c0 x22: ffff80008098bdf0 x21: ffffa66f5f6957a0
| x20: 0000000000000009 x19: ffffa66f5e2c1000 x18: 0000000000000006
| x17: 0000000000000000 x16: 0000000000000000 x15: ffff80008098b720
| x14: 0000000000000000 x13: ffffa66f5f3310b0 x12: 0000000000000432
| x11: 0000000000000166 x10: ffffa66f5f3890b0 x9 : ffffa66f5f3310b0
| x8 : 00000000ffffefff x7 : ffffa66f5f3890b0 x6 : 80000000fffff000
| x5 : ffff0003fdf9fcc8 x4 : 0000000000000000 x3 : 0000000000000000
| x2 : 00000000abcd1234 x1 : 000000000198b861 x0 : ffffa66f5e8334b0
| Call trace:
|  lkdtm_WRITE_RO+0x3c/0x54
|  lkdtm_do_action+0x1c/0x30
|  direct_entry+0xbc/0x144
|  full_proxy_write+0x60/0xb4
|  vfs_write+0xd0/0x35c
|  ksys_write+0x70/0x104
|  __arm64_sys_write+0x1c/0x28
|  invoke_syscall+0x48/0x110
|  el0_svc_common.constprop.0+0x40/0xe0
|  do_el0_svc+0x1c/0x28
|  el0_svc+0x34/0xb4
|  el0t_64_sync_handler+0x120/0x12c
|  el0t_64_sync+0x190/0x194
| Code: f2b579a2 900089c0 ca020021 9112c000 (f905fa61) 
| ---[ end trace 0000000000000000 ]---

... and likewise even when we hit addresses without KASAN shadow:

| # echo ACCESS_NULL > /sys/kernel/debug/provoke-crash/DIRECT 
| lkdtm: Performing direct entry ACCESS_NULL
| lkdtm: attempting bad read at 0000000000000000
| Unable to handle kernel paging request at virtual address dfff800000000000
| Fauling address: non-paged memory
| KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
| Mem abort info:
|   ESR = 0x0000000096000005
|   EC = 0x25: DABT (current EL), IL = 32 bits
|   SET = 0, FnV = 0
|   EA = 0, S1PTW = 0
|   FSC = 0x05: level 1 translation fault
| Data abort info:
|   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
|   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
|   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
| [dfff800000000000] address between user and kernel address ranges
| Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 2 PID: 149 Comm: sh Not tainted 6.8.0-rc3-00002-gf95e3bd06270-dirty #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
| pc : lkdtm_ACCESS_NULL+0x2c/0xa4
| lr : lkdtm_ACCESS_NULL+0x24/0xa4
| sp : ffff800080d87ab0
| x29: ffff800080d87ab0 x28: ffff0000c5296f14 x27: 1fffe00018a52df5
| x26: ffff0000c5296fa0 x25: 1fffe00018a52df4 x24: ffff0000c8fcda00
| x23: 0000aaaadb0a7040 x22: ffff800080d87d70 x21: ffffcb98b050dae0
| x20: 000000000000000c x19: ffffcb98acd53f40 x18: ffff0003fdff56c0
| x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
| x14: 0000000000000000 x13: 3030303030303030 x12: ffff7000101b0ee1
| x11: 1ffff000101b0ee0 x10: ffff7000101b0ee0 x9 : dfff800000000000
| x8 : 00008fffefe4f120 x7 : ffff800080d87707 x6 : 0000000000000001
| x5 : ffff800080d87700 x4 : 1fffe000191f9b41 x3 : 0000000000000000
| x2 : 0000000000000000 x1 : ffff0000c8fcda00 x0 : dfff800000000000
| Call trace:
|  lkdtm_ACCESS_NULL+0x2c/0xa4
|  lkdtm_do_action+0x3c/0x68
|  direct_entry+0x130/0x228
|  full_proxy_write+0xd4/0x140
|  vfs_write+0x1e4/0x8ec
|  ksys_write+0xf4/0x1e0
|  __arm64_sys_write+0x6c/0x9c
|  invoke_syscall+0x6c/0x254
|  el0_svc_common.constprop.0+0xac/0x230
|  do_el0_svc+0x40/0x58
|  el0_svc+0x38/0x70
|  el0t_64_sync_handler+0x120/0x12c
|  el0t_64_sync+0x190/0x194
| Code: aa1303e0 97adbc11 d2d00000 f2fbffe0 (39c00000) 
| ---[ end trace 0000000000000000 ]---

> Signed-off-by: Stefan Wiehler <stefan.wiehler at nokia.com>
> ---
>  arch/arm64/include/asm/system_misc.h |  1 +
>  arch/arm64/kernel/process.c          | 11 +++++++++++
>  arch/arm64/kernel/traps.c            |  1 +
>  3 files changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
> index c34344256762..5b9b02fee988 100644
> --- a/arch/arm64/include/asm/system_misc.h
> +++ b/arch/arm64/include/asm/system_misc.h
> @@ -31,6 +31,7 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned long,
>  
>  struct mm_struct;
>  extern void __show_regs(struct pt_regs *);
> +extern void __show_regs_alloc_free(struct pt_regs *);
>  
>  #endif	/* __ASSEMBLY__ */
>  
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 7387b68c745b..b6fc2ac6e007 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -197,6 +197,17 @@ static void print_pstate(struct pt_regs *regs)
>  	}
>  }
>  
> +void __show_regs_alloc_free(struct pt_regs *regs)
> +{
> +	int i;
> +
> +	/* check for x0 - x29 only */
> +	for (i = 0; i < 30; i++) {
> +		pr_alert("Register x%d information:", i);
> +		mem_dump_obj((void *)regs->regs[i]);
> +	}
> +}

As above, I don't think it makes sense to dump this for a bunch of GPRs, and
I'd strongly prefer that we just do this for the faulting address.

I would also note that X30 can be used as a scratch register in the body of
functions and might hold an interesting address, whereas x2 *should* always
point to a frame record on the stack. So I don't understand why we'd both
dumping x29 but skip x30.

>  void __show_regs(struct pt_regs *regs)
>  {
>  	int i, top_reg;
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 215e6d7f2df8..b7bf22f1b33f 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -197,6 +197,7 @@ static int __die(const char *str, long err, struct pt_regs *regs)
>  
>  	print_modules();
>  	show_regs(regs);
> +	__show_regs_alloc_free(regs);
>  
>  	dump_kernel_instr(KERN_EMERG, regs);

We call die() (and consequently __die()) for a number of errors where the
provenance information isn't interesting (e.g. UNDEF, BTI, FPAC, BUG). As above
I think it makes more sense to only dump this in die_kernel_fault(), and only
for the actual faulting address.

Thanks,
Mark.



More information about the linux-arm-kernel mailing list