CPU stalls when handling PAN emulation?
Kees Cook
keescook at chromium.org
Thu Jan 11 16:11:27 PST 2024
Hi,
Mark noticed that LKDTM's EXEC_USERSPACE test[1] (which trips the PAN
emulation, CONFIG_CPU_SW_DOMAIN_PAN=y, on 32-bit arm) appears to be
creating a situation that leads to a CPU stall. make_task_dead() reports:
note: ...[NNN] exited with preempt_count 1
which isn't seen with other Oopses (like EXEC_RODATA). (They do also
both report: "note: ...[NNN] exited with irqs disabled" too, but this
seems survivable.) I note that the ACCESS_USERSPACE test does _not_
have this problem.
ACCESS_USERSPACE (survivable) starts with:
lkdtm: Performing direct entry ACCESS_USERSPACE
lkdtm: attempting bad read at 76f44000
8<--- cut here ---
Unhandled fault: page domain fault (0x01b) at 0x76f44000
EXEC_USERSPACE (leads to CPU stall) starts with:
lkdtm: Performing direct entry EXEC_USERSPACE
lkdtm: attempting ok execution at 8075bf18
lkdtm: attempting bad execution at 76f6f000
Unhandled prefetch abort: page domain fault (0x01b) at 0x76f6f000
8<--- cut here ---
Unhandled fault: page domain fault (0x01b) at 0x76f6f000
So they're both getting caught by the Domain stuff, but there looks to
be a second fault for EXEC_USERSPACE. (more below)
For the CPU stall to appear there (at least) needs to be a second Oops.
As an example, if I run EXEC_USERSPACE and then EXEC_RODATA, the latter
stops sending to the console very quickly, reporting only the very start
of the Oops:
lkdtm: Performing direct entry EXEC_USERSPACE
lkdtm: attempting ok execution at 8075bf18
lkdtm: attempting bad execution at 76f10000
Unhandled prefetch abort: page domain fault (0x01b) at 0x76f10000
8<--- cut here ---
Unhandled fault: page domain fault (0x01b) at 0x76f10000
[76f10000] *pgd=44f6e835, *pte=469a455f, *ppte=469a4c7e
Internal error: : 1b [#1] SMP ARM
Modules linked in:
...
Stack: (0xf0959d10 to 0xf095a000)
...
copy_from_kernel_nofault from is_valid_bugaddr+0x40/0x84
r7:f0959e08 r6:76f10000 r5:81a60000 r4:00000000
is_valid_bugaddr from report_bug+0x4c/0x1b8
r4:80e84f4c
report_bug from die+0xb4/0x2f0
r10:8100a3dc r9:80f0ead0 r8:60070193 r7:81a60000 r6:0000001b r5:80cf5e1c
r4:f0959e08
die from arm_notify_die+0x54/0x58
r10:81a60000 r9:81a60000 r8:80fab2ec r7:80f0ffc8 r6:f0959e08 r5:76f10000
r4:0000001b
arm_notify_die from do_PrefetchAbort+0x90/0x98
do_PrefetchAbort from __pabt_svc+0x5c/0xa0
Exception stack(0xf0959e08 to 0xf0959e50)
r7:f0959e3c r6:ffffffff r5:60070013 r4:76f10000
lkdtm_EXEC_USERSPACE from lkdtm_do_action+0x2c/0x4c
r4:84fc6000
lkdtm_do_action from direct_entry+0x130/0x150
...
---[ end trace 0000000000000000 ]---
note: cat[1271] exited with irqs disabled
note: cat[1271] exited with preempt_count 1
lkdtm: Performing direct entry EXEC_RODATA
lkdtm: attempting ok execution at 8075bf18
lkdtm: attempting bad execution at 80b42118
8<--- cut here ---
Unable to handle kernel paging request at virtual address 80b42118 when execute
[80b42118] *pgd=40a1940e(bad)
****nothing else****
Here the 2 crashes in EXEC_USERSPACE are visible. Does anyone see
something obvious in the exception handling that might cause this? I'm
not sure what to do next to figure out what's going wrong.
Any help greatly appreciated! :)
-Kees
[1] To run an LKDTM test, build with CONFIG_LKDTM=y, mount debugfs and do:
echo "EXEC_USERSPACE" | cat >/sys/kernel/debug/provoke-crash/DIRECT
(The pipe to cat is to avoid killing your shell on Oopses and BUGs.)
To list all available tests:
cat /sys/kernel/debug/provoke-crash/DIRECT
--
Kees Cook
More information about the linux-arm-kernel
mailing list