[PATCH v6 07/30] arm64: Reset POR_EL1 on exception entry
Kevin Brodsky
kevin.brodsky at arm.com
Tue May 5 08:42:50 PDT 2026
On 27/02/2026 18:54, Kevin Brodsky wrote:
> POR_EL1 will be modified, through the kpkeys framework, in order to
> grant temporary RW access to certain keys. If an exception occurs
> in the middle of a "critical section" where POR_EL1 is set to a
> privileged value, it is preferable to reset it to its default value
> upon taking the exception to minimise the amount of code running at
> higher kpkeys level.
It turns out there is a corner case where this doesn't play well with
patch 28 (batching using lazy MMU mode). I got the following splat:
[ 33.603892] Unable to handle kernel write to read-only memory at
virtual address ffff00087fbbbd78
[ 33.603969] Mem abort info:
[ 33.604028] ESR = 0x000000409600004f
[ 33.604058] EC = 0x25: DABT (current EL), IL = 32 bits
[ 33.604101] SET = 0, FnV = 0
[ 33.604133] EA = 0, S1PT
** replaying previous printk message **
[ 33.604133] EA = 0, S1PTW = 0
[ 33.604165] FSC = 0x0f: level 3 permission fault
[ 33.604200] Data abort info:
[ 33.604222] ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000040
[ 33.604259] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[ 33.604303] GCS = 0, Overlay = 1, DirtyBit = 0, Xs = 0
[ 33.604345] swapper pgtable: 4k pages, 48-bit VAs,
pgdp=00000000eec2a000
[ 33.604397] [ffff00087fbbbd78] pgd=0000000000000000,
p4d=18000008fffff403, pud=18000008ffa2d403, pmd=18000008ff82f403,
pte=10e80008ffbbb707
[ 33.605031] Internal error: Oops: 000000409600004f [#1] SMP
[ 33.605596] Modules linked in:
[ 33.605690] CPU: 0 UID: 0 PID: 1 Comm: systemd Tainted: G
N 7.1.0-rc2-00028-g497c3a31207b #371 PREEMPT
[ 33.605864] Tainted: [N]=TEST
[ 33.605933] Hardware name: FVP Base RevC (DT)
[ 33.606012] pstate: 141402009 (nZcv daif +PAN -UAO -TCO +DIT
-SSBS BTYPE=--)
[ 33.606140] pc : pageattr_pte_entry+0x18/0x118
[ 33.606272] lr : walk_pte_range_inner+0x1d8/0x480
[ 33.606393] sp : ffff80008005b5a0
[ 33.606467] x29: ffff80008005b5d0 x28: ffffa991675fd6b0 x27:
ffff00080e5b0000
[ 33.606662] x26: ffff00080e7af000 x25: 0010000000000001 x24:
0040000000000001
[ 33.606855] x23: 0040000000000041 x22: ffff00080e5b0000 x21:
ffff80008005b740
[ 33.607052] x20: ffff00087fbbbd78 x19: ffff00080e5af000 x18:
0000000000000000
[ 33.607245] x17: ffff0008001d2240 x16: 0000000000000004 x15:
0000000000000000
[ 33.607434] x14: ffff00080a80b810 x13: 000000000000b706 x12:
0000000000000001
[ 33.607622] x11: 0000000000000000 x10: 0000000000000000 x9 :
0000000000000020
[ 33.607809] x8 : ffffa991686a7130 x7 : ffff00880e5af000 x6 :
0000000000000072
[ 33.608000] x5 : 0000000000000003 x4 : ffffa9916625e028 x3 :
0000000000000002
[ 33.608187] x2 : 0000000000000000 x1 : 00e800088e5af707 x0 :
ffff00087fbbbd78
[ 33.608378] Call trace:
[ 33.608441] pageattr_pte_entry+0x18/0x118 (P)
[ 33.608587] walk_pgd_range+0x648/0x94c
[ 33.608716] walk_kernel_page_table_range_lockless+0x5c/0x98
[ 33.608864] update_range_prot+0x8c/0x1a4
[ 33.609007] set_memory_pkey+0x48/0x80
[ 33.609149] kpkeys_pgtable_free+0x40/0x9c
[ 33.609305] pgd_free+0xd8/0x120
[ 33.609429] __mmdrop+0x54/0x1d0
[ 33.609552] finish_task_switch.isra.0+0x234/0x2c4
[ 33.609714] __schedule+0x3ac/0xf00
[ 33.609860] preempt_schedule_irq+0x3c/0x7c
[ 33.610013] raw_irqentry_exit_cond_resched+0x2c/0x54
[ 33.610154] arm64_exit_to_kernel_mode+0x40/0x5c
[ 33.610290] el1_interrupt+0x48/0x60
[ 33.610416] el1h_64_irq_handler+0x18/0x24
[ 33.610553] el1h_64_irq+0x8c/0x90
[ 33.610672] __vunmap_range_noflush+0x310/0x540 (P)
[ 33.610829] remove_vm_area+0x50/0xa4
[ 33.610977] vfree+0x38/0x274
[ 33.611118] n_tty_close+0x40/0xa8
[ 33.611234] tty_ldisc_close+0x4c/0xb0
[ 33.611360] tty_ldisc_kill+0x30/0x64
[ 33.611485] tty_ldisc_release+0xd0/0x1b0
[ 33.611615] tty_release_struct+0x20/0x88
[ 33.611766] tty_release+0x384/0x480
[ 33.611912] __fput+0xd0/0x300
[ 33.612041] fput_close_sync+0x38/0x108
[ 33.612180] __arm64_sys_close+0x38/0x7c
[ 33.612308] invoke_syscall.constprop.0+0x40/0x108
[ 33.612447] el0_svc_common.constprop.0+0x38/0xd8
[ 33.612589] do_el0_svc+0x1c/0x28
[ 33.612720] el0_svc+0x38/0x148
[ 33.612846] el0t_64_sync_handler+0xa0/0xe4
[ 33.612984] el0t_64_sync+0x198/0x19c
[ 33.613137] Code: a9400c42 8a230021 aa020021 1400000a (f9000001)
[ 33.613230] ---[ end trace 0000000000000000 ]---
[ 33.974524] Kernel panic - not syncing: Oops: Fatal exception
What happened is that a thread entered lazy MMU mode in
vunmap_pte_range() (inlined) and then an IRQ fired. On the exit path of
the IRQ, another thread got scheduled. Later, the original thread was
scheduled again, and it so happened that finish_task_switch() had some
mm to drop (mmdrop_lazy_tlb_sched(mm)) and we got the last reference on
that mm. We then proceed to free the PGD and eventually write to a
linear map page table to reset the pkey.
Because this patch resets POR_EL1 on exception entry, anything running
before exception return uses the default POR_EL1 value, which does not
grant write access to page tables. This is indeed the intention, but as
this crash shows, it comes with an implicit assumption that the
context-switching machinery does not itself write to page tables (at
least not on the irqexit path).
This patch isn't functionally required for page table protection so it
will be dropped in RFC v7. Maybe lazy MMU mode could be paused for the
duration of finish_task_switch() instead, but I'm not sure whether this
is a generic enough solution.
- Kevin
More information about the linux-arm-kernel
mailing list