[PATCH] mm/page_table_check: do not track special (PFN-mapped) PTEs
Andrey Smirnov
andrey.smirnov at siderolabs.com
Mon Jun 8 08:57:58 PDT 2026
The vDSO data store ("[vvar]") special mapping is created as a VM_PFNMAP
mapping and its pages are installed into userspace with vmf_insert_pfn(),
which produces special PTEs (pte_special()). On x86 and arm64 (and riscv)
pte_user_accessible_page() only tests the PRESENT/USER bits and does not
exclude special PTEs, so page_table_check accounts these PFN mappings in
the per-page anon/file map counters even though they are not rmap-managed
pages (vm_normal_page() returns NULL for them).
Most of these data pages live in the kernel image and are never freed, so
the stray accounting is invisible. The time-namespace VVAR page is the
exception: it is a real alloc_page() page that is released with
__free_page() in free_time_ns() when the last task of a time namespace
exits. Across the map / unmap / vdso_join_timens() zap transitions the
special-PTE accounting is not balanced for this page, so a non-zero
file_map_count survives to the free path and trips:
kernel BUG at mm/page_table_check.c:143!
__page_table_check_zero+0xfb/0x130
__free_frozen_pages+0x52f/0x650
free_time_ns+0x85/0xc0
free_nsproxy+0x7f/0x130
do_exit+0x313/0xa60
do_group_exit+0x77/0x90
This is reliably reproducible on x86_64 and arm64 under heavy container/CI
churn that rapidly creates and destroys time namespaces (CLONE_NEWTIME via
runc / docker-init / tini), and was independently reported by syzbot on
riscv. It only manifests when CONFIG_PAGE_TABLE_CHECK is active.
Special PTEs have no struct-page rmap semantics and must never have been
tracked by page table check. Skip them in both the set and clear paths so
the counters stay balanced (always zero) for PFN-mapped pages, regardless
of how the architecture defines pte_user_accessible_page(). pte_special()
is available generically (it is a no-op returning false on architectures
without ARCH_HAS_PTE_SPECIAL), so this is a single, arch-independent fix.
Note that the v7.0 generic vDSO datastore rework in commit 05988dba1179
("vdso/datastore: Allocate data pages dynamically") incidentally avoids
the problem by switching the mapping to VM_MIXEDMAP + vmf_insert_page()
with balanced struct-page accounting. This patch fixes the still-affected
VM_PFNMAP path used by 6.18.y and earlier, and additionally makes
page_table_check robust against any future PFN-mapped user pages.
Fixes: df4e817b7108 ("mm: page table check")
Cc: Thomas Gleixner <tglx at linutronix.de>
Cc: Thomas Weißschuh <thomas.weissschuh at linutronix.de>
Cc: Andrei Vagin <avagin at gmail.com>
Cc: Andy Lutomirski <luto at kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino at arm.com>
Reported-by: syzbot+2b5fe617654be3d8848b at syzkaller.appspotmail.com
Closes: https://github.com/siderolabs/talos/issues/13496
Cc: stable at vger.kernel.org
Signed-off-by: Andrey Smirnov <andrey.smirnov at siderolabs.com>
---
mm/page_table_check.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 4eeca782b888..ee492d5389b9 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -150,9 +150,16 @@ void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
if (&init_mm == mm)
return;
- if (pte_user_accessible_page(pte)) {
+ /*
+ * PFN-mapped (special) PTEs - e.g. the vDSO/time-namespace "[vvar]"
+ * mapping installed via vmf_insert_pfn() - are not rmap-managed and
+ * must not be tracked here. Tracking them can leave a non-zero map
+ * count on a struct page that is later freed (the time namespace VVAR
+ * page in free_time_ns()), tripping the BUG_ON() in
+ * __page_table_check_zero().
+ */
+ if (pte_user_accessible_page(pte) && !pte_special(pte))
page_table_check_clear(pte_pfn(pte), PAGE_SIZE >> PAGE_SHIFT);
- }
}
EXPORT_SYMBOL(__page_table_check_pte_clear);
@@ -205,7 +212,7 @@ void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
for (i = 0; i < nr; i++)
__page_table_check_pte_clear(mm, ptep_get(ptep + i));
- if (pte_user_accessible_page(pte))
+ if (pte_user_accessible_page(pte) && !pte_special(pte))
page_table_check_set(pte_pfn(pte), nr, pte_write(pte));
}
EXPORT_SYMBOL(__page_table_check_ptes_set);
--
2.53.0
More information about the linux-riscv
mailing list