[PATCH] arm64: Check pxd_leaf() instead of !pxd_table() while tearing down page tables

Will Deacon will at kernel.org
Thu May 15 06:21:01 PDT 2025


On Thu, May 15, 2025 at 03:04:50PM +0200, David Hildenbrand wrote:
> On 15.05.25 14:56, Will Deacon wrote:
> > On Thu, May 15, 2025 at 11:32:22AM +0200, David Hildenbrand wrote:
> > > On 15.05.25 11:27, Dev Jain wrote:
> > > > 
> > > > 
> > > > On 15/05/25 2:23 pm, David Hildenbrand wrote:
> > > > > On 15.05.25 10:47, Dev Jain wrote:
> > > > > > 
> > > > > > 
> > > > > > On 15/05/25 2:06 pm, David Hildenbrand wrote:
> > > > > > > On 15.05.25 10:22, Dev Jain wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 15/05/25 1:43 pm, David Hildenbrand wrote:
> > > > > > > > > On 15.05.25 08:34, Dev Jain wrote:
> > > > > > > > > > Commit 9c006972c3fe removes the pxd_present() checks because the
> > > > > > > > > > caller
> > > > > > > > > > checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller
> > > > > > > > > > only
> > > > > > > > > > checks pud_present(); pud_free_pmd_page() recurses on each pmd
> > > > > > > > > > through
> > > > > > > > > > pmd_free_pte_page(), wherein the pmd may be none.
> > > > > > > > > The commit states: "The core code already has a check for pXd_none()",
> > > > > > > > > so I assume that assumption was not true in all cases?
> > > > > > > > > 
> > > > > > > > > Should that one problematic caller then check for pmd_none() instead?
> > > > > > > > 
> > > > > > > >      From what I could gather of Will's commit message, my
> > > > > > > > interpretation is
> > > > > > > > that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd.
> > > > > > > > These individually check for pxd_present():
> > > > > > > > 
> > > > > > > > if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
> > > > > > > >        return 0;
> > > > > > > > 
> > > > > > > > The problem is that vmap_try_huge_pud will also iterate on pte entries.
> > > > > > > > So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page
> > > > > > > > may encounter a none pmd and trigger a WARN.
> > > > > > > 
> > > > > > > Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.
> > > > > > > 
> > > > > > > I assume we should either have an explicit pmd_none() check in
> > > > > > > pud_free_pmd_page() before calling pmd_free_pte_page(), or one in
> > > > > > > pmd_free_pte_page().
> > > > > > > 
> > > > > > > With your patch, we'd be calling pte_free_kernel() on a NULL pointer,
> > > > > > > which sounds wrong -- unless I am missing something important.
> > > > > > 
> > > > > > Ah thanks, you seem to be right. We will be extracting table from a none
> > > > > > pmd. Perhaps we should still bail out for !pxd_present() but without the
> > > > > > warning, which the fix commit used to do.
> > > > > 
> > > > > Right. We just make sure that all callers of pmd_free_pte_page() already
> > > > > check for it.
> > > > > 
> > > > > I'd just do something like:
> > > > > 
> > > > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > > > > index 8fcf59ba39db7..e98dd7af147d5 100644
> > > > > --- a/arch/arm64/mm/mmu.c
> > > > > +++ b/arch/arm64/mm/mmu.c
> > > > > @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long
> > > > > addr)
> > > > > 
> > > > >            pmd = READ_ONCE(*pmdp);
> > > > > 
> > > > > -       if (!pmd_table(pmd)) {
> > > > > -               VM_WARN_ON(1);
> > > > > -               return 1;
> > > > > -       }
> > > > > +       VM_WARN_ON(!pmd_present(pmd));
> > > > > +       VM_WARN_ON(!pmd_table(pmd));
> > > > 
> > > > And also return 1?
> > > 
> > > I'll leave that to Catalin + Will.
> > > 
> > > I'm not a friend for adding runtime-overhead for soemthing that should not
> > > happen and be caught early during testing -> VM_WARN_ON_ONCE().
> > 
> > I definitely think we should return early if the pmd isn't a table.
> > Otherwise, we could end up descending into God-knows-what!
> 
> The question is: how did something that is not a table end up here, and why
> is it valid to check exactly that at runtime. Not strong opinion, it just
> feels a bit arbitrary to test for exactly that at runtime if it is
> completely unexpected.

I see it a little bit like type-checking: we could see an invalid entry,
a leaf entry or a table entry and we should only ever dereference the
latter. If the VM_WARN_ON() is justified, then I find it jarring to go
ahead with the dereference regardless of the type.

That said, maybe the VM_WARN_ON() should either be deleted or moved out
to the callers in mm/vmalloc.c?

Will



More information about the linux-arm-kernel mailing list