[PATCH v1] arm64: mm: Permit PTE SW bits to change in live mappings
Ryan Roberts
ryan.roberts at arm.com
Wed Jun 19 08:58:32 PDT 2024
On 19/06/2024 15:54, Peter Xu wrote:
> Hi, Ryan,
>
> On Wed, Jun 19, 2024 at 01:18:56PM +0100, Ryan Roberts wrote:
>> Previously pgattr_change_is_safe() was overly-strict and complained
>> (e.g. "[ 116.262743] __check_safe_pte_update: unsafe attribute change:
>> 0x0560000043768fc3 -> 0x0160000043768fc3") if it saw any SW bits change
>> in a live PTE. There is no such restriction on SW bits in the Arm ARM.
>>
>> Until now, no SW bits have been updated in live mappings via the
>> set_ptes() route. PTE_DIRTY would be updated live, but this is handled
>> by ptep_set_access_flags() which does not call pgattr_change_is_safe().
>> However, with the introduction of uffd-wp for arm64, there is core-mm
>> code that does ptep_get(); pte_clear_uffd_wp(); set_ptes(); which
>> triggers this false warning.
>>
>> Silence this warning by masking out the SW bits during checks.
>>
>> The bug isn't technically in the highlighted commit below, but that's
>> where bisecting would likely lead as its what made the bug user-visible.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts at arm.com>
>> Fixes: 5b32510af77b ("arm64/mm: Add uffd write-protect support")
>> ---
>>
>> Hi All,
>>
>> This applies on top of v6.10-rc4 and it would be good to land this as a hotfix
>> for v6.10 since its effectively fixing a bug in 5b32510af77b which was merged
>> for v6.10.
>>
>> I've only been able to trigger this occasionally by running the mm uffd
>> selftests, when swap is configured to use a small (64M) zRam disk. With this fix
>> applied I can no longer trigger it.
>
> Totally not familiar with the arm64 pgtable checker here, but I'm just
> wondering how the swap affected this, as I see there's:
>
> /* creating or taking down mappings is always safe */
> if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
> return true;
>
> Should pte_valid() always report false on swap entries? Does it mean that
> it'll always report PASS for anything switch from/to a swap entry for the
> checker?
Yes that's correct; swap ptes are invalid from the HW's pov so you can always
safely change their values from anything to anything (as long as the valid bit
remains 0).
>
> I assume that's also why you didn't cover bit 3 (uffd-wp swap bit on arm64,
> per my read in your previous series), but I don't think I'm confident on my
> understanding yet. It might be nice to mention how that was triggered in
> the commit message from that regard.
Bit 3 is the uffd-wp bit in swap ptes. Bit 58 is the uffd-wp bit for valid ptes.
Here we are only concerned with valid ptes. Yes, its a mess ;-)
>
>>
>> Thanks,
>> Ryan
>>
>> arch/arm64/include/asm/pgtable-hwdef.h | 1 +
>> arch/arm64/mm/mmu.c | 3 ++-
>> 2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>> index 9943ff0af4c9..1f60aa1bc750 100644
>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>> @@ -170,6 +170,7 @@
>> #define PTE_CONT (_AT(pteval_t, 1) << 52) /* Contiguous range */
>> #define PTE_PXN (_AT(pteval_t, 1) << 53) /* Privileged XN */
>> #define PTE_UXN (_AT(pteval_t, 1) << 54) /* User XN */
>> +#define PTE_SWBITS_MASK _AT(pteval_t, (BIT(63) | GENMASK(58, 55)))
>>
>> #define PTE_ADDR_LOW (((_AT(pteval_t, 1) << (50 - PAGE_SHIFT)) - 1) << PAGE_SHIFT)
>> #ifdef CONFIG_ARM64_PA_BITS_52
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index c927e9312f10..353ea5dc32b8 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -124,7 +124,8 @@ bool pgattr_change_is_safe(u64 old, u64 new)
>> * The following mapping attributes may be updated in live
>> * kernel mappings without the need for break-before-make.
>> */
>> - pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG;
>> + pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG |
>> + PTE_SWBITS_MASK;
>
> When applying the uffd-wp bit, normally we shouldn't need this as we'll
> need to do BBM-alike ops to avoid concurrent HW A/D updates. E.g.
> change_pte_range() uses the ptep_modify_prot_* APIs.
>
> But indeed at least unprotect / clear-uffd-bit doesn't logically need that,
> we already do that in e.g. do_wp_page(). From that POV it makes sense to
> me, as I also don't see why soft-bits are forbidden to be updated on ptes
> if HWs ignore them as a pretty generic concept. Just want to double check
> with you.
This bug was indeed triggering from do_wp_page() as you say, and I was
considering sending out a separate patch to change that code to use the
ptep_modify_prot_start()/ptep_modify_prot_commit() pattern which transitions the
pte through 0 so that we guarrantee not to lose any A/D updates. In the end I
convinced myself that while ptep_get(); pte_clear_uffd_wp(); set_ptes(); is a
troubling pattern, it is safe in this instance because the page is
write-protected so the HW can't race to set the dirty bit.
The code in question is:
if (userfaultfd_pte_wp(vma, ptep_get(vmf->pte))) {
if (!userfaultfd_wp_async(vma)) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
return handle_userfault(vmf, VM_UFFD_WP);
}
/*
* Nothing needed (cache flush, TLB invalidations,
* etc.) because we're only removing the uffd-wp bit,
* which is completely invisible to the user.
*/
pte = pte_clear_uffd_wp(ptep_get(vmf->pte));
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
/*
* Update this to be prepared for following up CoW
* handling
*/
vmf->orig_pte = pte;
}
Perhaps we should consider a change to the following style as a cleanup?
old_pte = ptep_modify_prot_start(vma, addr, pte);
ptent = pte_clear_uffd_wp(old_pte);
ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
Regardless, this patch is still a correct and valuable change; arm64 arch
doesn't care if SW bits are modified in valid mappings so we shouldn't be
checking for it.
>
>>
>> /* creating or taking down mappings is always safe */
>> if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
>> --
>> 2.43.0
>>
>
> When looking at this function I found this and caught my attention too:
>
> /* live contiguous mappings may not be manipulated at all */
> if ((old | new) & PTE_CONT)
> return false;
>
> I'm now wondering how cont-ptes work with uffd-wp now for arm64, from
> either hugetlb or mTHP pov. This check may be relevant here as a start.
When transitioning a block of ptes between cont and non-cont, we transition the
block through invalid with tlb invalidation. See contpte_convert().
>
> The other thing is since x86 doesn't have cont-ptes yet, uffd-wp didn't
> consider that, and there may be things overlooked at least from my side.
> E.g., consider wr-protect one cont-pte huge pages on hugetlb:
>
> static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
> {
> return huge_pte_wrprotect(pte_mkuffd_wp(pte));
> }
>
> I think it means so far it won't touch the rest cont-ptes but the 1st. Not
> sure whether it'll work if write happens on the rest.
I'm not completely sure I follow your point. I think this should work correctly.
The arm64 huge_pte code knows what size (and level) the huge pte is and spreads
the passed in pte across all the HW ptes.
>
> For mTHPs, they should still be done in change_pte_range() which doesn't
> understand mTHPs yet, so it should loop over all ptes and looks good so
> far, but I didn't further check other than that.
For mTHP, it will JustWork (TM). PTEs are exposed to core-mm with the same
semantics they had before; they all appear independent. The code dertermines
when it needs to apply or remove PTE_CONT bit, and in that case, the block is
transitioned through invalid state + tlbi. See contpte_try_fold() and
contpte_try_unfold().
Hope that helps!
Thanks,
Ryan
>
> Thanks,
>
More information about the linux-arm-kernel
mailing list