[PATCH] mm/page_table_check: do not track special (PFN-mapped) PTEs

David Hildenbrand (Arm) david at kernel.org
Wed Jun 17 04:07:14 PDT 2026


On 6/8/26 17:57, Andrey Smirnov wrote:
> The vDSO data store ("[vvar]") special mapping is created as a VM_PFNMAP
> mapping and its pages are installed into userspace with vmf_insert_pfn(),
> which produces special PTEs (pte_special()). On x86 and arm64 (and riscv)
> pte_user_accessible_page() only tests the PRESENT/USER bits and does not
> exclude special PTEs, so page_table_check accounts these PFN mappings in
> the per-page anon/file map counters even though they are not rmap-managed
> pages (vm_normal_page() returns NULL for them).
> 
> Most of these data pages live in the kernel image and are never freed, so
> the stray accounting is invisible. The time-namespace VVAR page is the
> exception: it is a real alloc_page() page that is released with
> __free_page() in free_time_ns() when the last task of a time namespace
> exits. Across the map / unmap / vdso_join_timens() zap transitions the
> special-PTE accounting is not balanced for this page, so a non-zero
> file_map_count survives to the free path and trips:
> 
>   kernel BUG at mm/page_table_check.c:143!
>   __page_table_check_zero+0xfb/0x130
>   __free_frozen_pages+0x52f/0x650
>   free_time_ns+0x85/0xc0
>   free_nsproxy+0x7f/0x130
>   do_exit+0x313/0xa60
>   do_group_exit+0x77/0x90
> 
> This is reliably reproducible on x86_64 and arm64 under heavy container/CI
> churn that rapidly creates and destroys time namespaces (CLONE_NEWTIME via
> runc / docker-init / tini), and was independently reported by syzbot on
> riscv. It only manifests when CONFIG_PAGE_TABLE_CHECK is active.
> 
> Special PTEs have no struct-page rmap semantics and must never have been
> tracked by page table check. Skip them in both the set and clear paths so
> the counters stay balanced (always zero) for PFN-mapped pages, regardless
> of how the architecture defines pte_user_accessible_page(). pte_special()
> is available generically (it is a no-op returning false on architectures
> without ARCH_HAS_PTE_SPECIAL), so this is a single, arch-independent fix.

Using pte_special() is usually a sign that something is likely shaky, as it
misses architectures that don't support CONFIG_ARCH_HAS_PTE_SPECIAL.

I assume relevant architectures (loongarch32?) do not support
CONFIG_PAGE_TABLE_CHECK.

arch/arm64/Kconfig:     select ARCH_SUPPORTS_PAGE_TABLE_CHECK
arch/powerpc/Kconfig:   select ARCH_SUPPORTS_PAGE_TABLE_CHECK   if !HUGETLB_PAGE
arch/riscv/Kconfig:     select ARCH_SUPPORTS_PAGE_TABLE_CHECK if MMU
arch/s390/Kconfig:      select ARCH_SUPPORTS_PAGE_TABLE_CHECK
arch/x86/Kconfig:       select ARCH_SUPPORTS_PAGE_TABLE_CHECK   if X86_64
mm/Kconfig.debug:       depends on ARCH_SUPPORTS_PAGE_TABLE_CHECK

Can we enforce somehow that we expect CONFIG_ARCH_HAS_PTE_SPECIAL, so anybody
unlocking ARCH_SUPPORTS_PAGE_TABLE_CHECK is aware of this?

For example, through a BUILD_BUG_ON?

-- 
Cheers,

David



More information about the linux-riscv mailing list