[PATCH] arm64/mm: Fix use-after-free due to race between memory hotunplug and ptdump

Dev Jain dev.jain at arm.com
Mon Jul 28 03:42:08 PDT 2025


On 28/07/25 4:01 pm, Dev Jain wrote:
> Memory hotunplug is done under the hotplug lock and ptdump walk is done
> under the init_mm.mmap_lock. Therefore, ptdump and hotunplug can run
> simultaneously without any synchronization. During hotunplug,
> free_empty_tables() is ultimately called to free up the pagetables.
> The following race can happen, where x denotes the level of the pagetable:
>
> CPU1					CPU2
> free_empty_pxd_table
> 					ptdump_walk_pgd()
> 					Get p(x+1)d table from pxd entry
> pxd_clear
> free_hotplug_pgtable_page(p(x+1)dp)
> 					Still using the p(x+1)d table
>
> which leads to a user-after-free.
>
> To solve this, we need to synchronize ptdump_walk_pgd() with
> free_hotplug_pgtable_page() in such a way that ptdump never takes a
> reference on a freed pagetable.
>
> Since this race is very unlikely to happen in practice, we do not want to
> penalize other code paths taking the init_mm mmap_lock. Therefore, we use
> static keys. ptdump will enable the static key - upon observing that,
> the free_empty_pxd_table() functions will get patched in with an
> mmap_read_lock/unlock sequence. A code comment explains in detail, how
> a combination of acquire semantics of static_branch_enable() and the
> barriers in __flush_tlb_kernel_pgtable() ensures that ptdump will never
> get a hold on the address of a freed pagetable - either ptdump will block
> the table freeing path due to write locking the mmap_lock, or, the nullity
> of the pxd entry will be observed by ptdump, therefore having no access to
> the isolated p(x+1)d pagetable.
>
> This bug was found by code inspection, as a result of working on [1].
> 1. https://lore.kernel.org/all/20250723161827.15802-1-dev.jain@arm.com/
>
> Cc: <stable at vger.kernel.org>
> Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
> Signed-off-by: Dev Jain <dev.jain at arm.com>
> ---

Immediately after posting, I guess the first objection which is going to
come is, why not just nest free_empty_tables() with mmap_read_lock/unlock.
Memory offlining obviously should not be a hot path so taking the read lock
should be fine I guess.




More information about the linux-arm-kernel mailing list