[PATCH] mm/page_table_check: do not track special (PFN-mapped) PTEs
David Hildenbrand (Arm)
david at kernel.org
Wed Jun 17 04:07:14 PDT 2026
On 6/8/26 17:57, Andrey Smirnov wrote:
> The vDSO data store ("[vvar]") special mapping is created as a VM_PFNMAP
> mapping and its pages are installed into userspace with vmf_insert_pfn(),
> which produces special PTEs (pte_special()). On x86 and arm64 (and riscv)
> pte_user_accessible_page() only tests the PRESENT/USER bits and does not
> exclude special PTEs, so page_table_check accounts these PFN mappings in
> the per-page anon/file map counters even though they are not rmap-managed
> pages (vm_normal_page() returns NULL for them).
>
> Most of these data pages live in the kernel image and are never freed, so
> the stray accounting is invisible. The time-namespace VVAR page is the
> exception: it is a real alloc_page() page that is released with
> __free_page() in free_time_ns() when the last task of a time namespace
> exits. Across the map / unmap / vdso_join_timens() zap transitions the
> special-PTE accounting is not balanced for this page, so a non-zero
> file_map_count survives to the free path and trips:
>
> kernel BUG at mm/page_table_check.c:143!
> __page_table_check_zero+0xfb/0x130
> __free_frozen_pages+0x52f/0x650
> free_time_ns+0x85/0xc0
> free_nsproxy+0x7f/0x130
> do_exit+0x313/0xa60
> do_group_exit+0x77/0x90
>
> This is reliably reproducible on x86_64 and arm64 under heavy container/CI
> churn that rapidly creates and destroys time namespaces (CLONE_NEWTIME via
> runc / docker-init / tini), and was independently reported by syzbot on
> riscv. It only manifests when CONFIG_PAGE_TABLE_CHECK is active.
>
> Special PTEs have no struct-page rmap semantics and must never have been
> tracked by page table check. Skip them in both the set and clear paths so
> the counters stay balanced (always zero) for PFN-mapped pages, regardless
> of how the architecture defines pte_user_accessible_page(). pte_special()
> is available generically (it is a no-op returning false on architectures
> without ARCH_HAS_PTE_SPECIAL), so this is a single, arch-independent fix.
Using pte_special() is usually a sign that something is likely shaky, as it
misses architectures that don't support CONFIG_ARCH_HAS_PTE_SPECIAL.
I assume relevant architectures (loongarch32?) do not support
CONFIG_PAGE_TABLE_CHECK.
arch/arm64/Kconfig: select ARCH_SUPPORTS_PAGE_TABLE_CHECK
arch/powerpc/Kconfig: select ARCH_SUPPORTS_PAGE_TABLE_CHECK if !HUGETLB_PAGE
arch/riscv/Kconfig: select ARCH_SUPPORTS_PAGE_TABLE_CHECK if MMU
arch/s390/Kconfig: select ARCH_SUPPORTS_PAGE_TABLE_CHECK
arch/x86/Kconfig: select ARCH_SUPPORTS_PAGE_TABLE_CHECK if X86_64
mm/Kconfig.debug: depends on ARCH_SUPPORTS_PAGE_TABLE_CHECK
Can we enforce somehow that we expect CONFIG_ARCH_HAS_PTE_SPECIAL, so anybody
unlocking ARCH_SUPPORTS_PAGE_TABLE_CHECK is aware of this?
For example, through a BUILD_BUG_ON?
--
Cheers,
David
More information about the linux-riscv
mailing list