[PATCH] arm64: Check pxd_leaf() instead of !pxd_table() while tearing down page tables
Ryan Roberts
ryan.roberts at arm.com
Thu May 15 06:14:23 PDT 2025
On 15/05/2025 14:01, David Hildenbrand wrote:
> On 15.05.25 12:07, Ryan Roberts wrote:
>> On 15/05/2025 09:53, David Hildenbrand wrote:
>>> On 15.05.25 10:47, Dev Jain wrote:
>>>>
>>>>
>>>> On 15/05/25 2:06 pm, David Hildenbrand wrote:
>>>>> On 15.05.25 10:22, Dev Jain wrote:
>>>>>>
>>>>>>
>>>>>> On 15/05/25 1:43 pm, David Hildenbrand wrote:
>>>>>>> On 15.05.25 08:34, Dev Jain wrote:
>>>>>>>> Commit 9c006972c3fe removes the pxd_present() checks because the caller
>>>>>>>> checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller
>>>>>>>> only
>>>>>>>> checks pud_present(); pud_free_pmd_page() recurses on each pmd through
>>>>>>>> pmd_free_pte_page(), wherein the pmd may be none.
>>>>>>> The commit states: "The core code already has a check for pXd_none()",
>>>>>>> so I assume that assumption was not true in all cases?
>>>>>>>
>>>>>>> Should that one problematic caller then check for pmd_none() instead?
>>>>>>
>>>>>> From what I could gather of Will's commit message, my interpretation is
>>>>>> that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd.
>>>>>> These individually check for pxd_present():
>>>>>>
>>>>>> if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
>>>>>> return 0;
>>>>>>
>>>>>> The problem is that vmap_try_huge_pud will also iterate on pte entries.
>>>>>> So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page
>>>>>> may encounter a none pmd and trigger a WARN.
>>>>>
>>>>> Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.
>>>>>
>>>>> I assume we should either have an explicit pmd_none() check in
>>>>> pud_free_pmd_page() before calling pmd_free_pte_page(), or one in
>>>>> pmd_free_pte_page().
>>>>>
>>>>> With your patch, we'd be calling pte_free_kernel() on a NULL pointer,
>>>>> which sounds wrong -- unless I am missing something important.
>>>>
>>>> Ah thanks, you seem to be right. We will be extracting table from a none
>>>> pmd. Perhaps we should still bail out for !pxd_present() but without the
>>>> warning, which the fix commit used to do.
>>>
>>> Right. We just make sure that all callers of pmd_free_pte_page() already check
>>> for it.
>>>
>>> I'd just do something like:
>>
>> I just reviewed the patch and had the same feedback as David. I agree with the
>> patch below, with some small mods...
>>
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 8fcf59ba39db7..e98dd7af147d5 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)
>>> pmd = READ_ONCE(*pmdp);
>>> - if (!pmd_table(pmd)) {
>>> - VM_WARN_ON(1);
>>> - return 1;
>>> - }
>>> + VM_WARN_ON(!pmd_present(pmd));
>>> + VM_WARN_ON(!pmd_table(pmd));
>>
>> You don't need both of these warnings; pmd_table() is only true if the pmd is
>> present (well actually only if it's _valid_ which is more strict than present),
>> so the second one is sufficient on its own.
>
> Ah, right.
>
>>
>>> table = pte_offset_kernel(pmdp, addr);
>>> pmd_clear(pmdp);
>>> @@ -1305,7 +1303,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
>>
>> Given you are removing the runtime check and early return in
>> pmd_free_pte_page(), I think you should modify this function to use the same
>> style too.
>
> BTW, the "return 1" is weird. But looking at x86, we seem to be making a private
> copy of the page table first, to defer freeing the page tables after the TLB flush.
>
> I wonder if there isn't a better way (e.g., clear PUDP + flush tlb, then walk
> over the effectively-disconnected page table). But I'm sure there is a good
> reason for that.
As I understand it, the actual TLB entries should have been invalidated when the
previous mappings we vfree'd. So the single page __flush_tlb_kernel_pgtable()
calls here are to zap any table entries that may be in the walk cache. We could
do an all-levels TLBI for the entire range, but for a system that doesn't
support the tlbi-range operations, we would end up issuing a tlbi per page
across the whole range which I think would be much slower than the one tlbi per
pgtable we have here.
Things could be rearranged a bit so that we issue all the tlbis with only a
single set of barriers (currently each __flush_tlb_kernel_pgtable() issues it's
own barriers), but I'm not sure how important that micro-optimization is given I
guess we never even call pud_free_pmd_page() in practice given we have had no
reports of the warning tripping.
>
>>
>>> next = addr;
>>> end = addr + PUD_SIZE;
>>> do {
>>> - pmd_free_pte_page(pmdp, next);
>>> + if (pmd_present(*pmdp))
>>
>> question: I wonder if it is better to use !pmd_none() as the condition here? It
>> should either be none or a table at this point, so this allows the warning in
>> pmd_free_pte_page() to catch more error conditions. No strong opinion though.
>
> Same here. The existing callers check pmd_present().
Yeah fair let's be consistent and use pmd_present().
Thanks,
Ryan
More information about the linux-arm-kernel
mailing list