[PATCH v1] arm64: mm: Permit PTE SW bits to change in live mappings

Ryan Roberts ryan.roberts at arm.com
Wed Jun 19 08:58:32 PDT 2024


On 19/06/2024 15:54, Peter Xu wrote:
> Hi, Ryan,
> 
> On Wed, Jun 19, 2024 at 01:18:56PM +0100, Ryan Roberts wrote:
>> Previously pgattr_change_is_safe() was overly-strict and complained
>> (e.g. "[  116.262743] __check_safe_pte_update: unsafe attribute change:
>> 0x0560000043768fc3 -> 0x0160000043768fc3") if it saw any SW bits change
>> in a live PTE. There is no such restriction on SW bits in the Arm ARM.
>>
>> Until now, no SW bits have been updated in live mappings via the
>> set_ptes() route. PTE_DIRTY would be updated live, but this is handled
>> by ptep_set_access_flags() which does not call pgattr_change_is_safe().
>> However, with the introduction of uffd-wp for arm64, there is core-mm
>> code that does ptep_get(); pte_clear_uffd_wp(); set_ptes(); which
>> triggers this false warning.
>>
>> Silence this warning by masking out the SW bits during checks.
>>
>> The bug isn't technically in the highlighted commit below, but that's
>> where bisecting would likely lead as its what made the bug user-visible.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts at arm.com>
>> Fixes: 5b32510af77b ("arm64/mm: Add uffd write-protect support")
>> ---
>>
>> Hi All,
>>
>> This applies on top of v6.10-rc4 and it would be good to land this as a hotfix
>> for v6.10 since its effectively fixing a bug in 5b32510af77b which was merged
>> for v6.10.
>>
>> I've only been able to trigger this occasionally by running the mm uffd
>> selftests, when swap is configured to use a small (64M) zRam disk. With this fix
>> applied I can no longer trigger it.
> 
> Totally not familiar with the arm64 pgtable checker here, but I'm just
> wondering how the swap affected this, as I see there's:
> 
> 	/* creating or taking down mappings is always safe */
> 	if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
> 		return true;
> 
> Should pte_valid() always report false on swap entries? Does it mean that
> it'll always report PASS for anything switch from/to a swap entry for the
> checker?

Yes that's correct; swap ptes are invalid from the HW's pov so you can always
safely change their values from anything to anything (as long as the valid bit
remains 0).

> 
> I assume that's also why you didn't cover bit 3 (uffd-wp swap bit on arm64,
> per my read in your previous series), but I don't think I'm confident on my
> understanding yet.  It might be nice to mention how that was triggered in
> the commit message from that regard.

Bit 3 is the uffd-wp bit in swap ptes. Bit 58 is the uffd-wp bit for valid ptes.
Here we are only concerned with valid ptes. Yes, its a mess ;-)

> 
>>
>> Thanks,
>> Ryan
>>
>>  arch/arm64/include/asm/pgtable-hwdef.h | 1 +
>>  arch/arm64/mm/mmu.c                    | 3 ++-
>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>> index 9943ff0af4c9..1f60aa1bc750 100644
>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>> @@ -170,6 +170,7 @@
>>  #define PTE_CONT		(_AT(pteval_t, 1) << 52)	/* Contiguous range */
>>  #define PTE_PXN			(_AT(pteval_t, 1) << 53)	/* Privileged XN */
>>  #define PTE_UXN			(_AT(pteval_t, 1) << 54)	/* User XN */
>> +#define PTE_SWBITS_MASK		_AT(pteval_t, (BIT(63) | GENMASK(58, 55)))
>>
>>  #define PTE_ADDR_LOW		(((_AT(pteval_t, 1) << (50 - PAGE_SHIFT)) - 1) << PAGE_SHIFT)
>>  #ifdef CONFIG_ARM64_PA_BITS_52
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index c927e9312f10..353ea5dc32b8 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -124,7 +124,8 @@ bool pgattr_change_is_safe(u64 old, u64 new)
>>  	 * The following mapping attributes may be updated in live
>>  	 * kernel mappings without the need for break-before-make.
>>  	 */
>> -	pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG;
>> +	pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG |
>> +			PTE_SWBITS_MASK;
> 
> When applying the uffd-wp bit, normally we shouldn't need this as we'll
> need to do BBM-alike ops to avoid concurrent HW A/D updates.  E.g.
> change_pte_range() uses the ptep_modify_prot_* APIs.
> 
> But indeed at least unprotect / clear-uffd-bit doesn't logically need that,
> we already do that in e.g. do_wp_page().  From that POV it makes sense to
> me, as I also don't see why soft-bits are forbidden to be updated on ptes
> if HWs ignore them as a pretty generic concept.  Just want to double check
> with you.

This bug was indeed triggering from do_wp_page() as you say, and I was
considering sending out a separate patch to change that code to use the
ptep_modify_prot_start()/ptep_modify_prot_commit() pattern which transitions the
pte through 0 so that we guarrantee not to lose any A/D updates. In the end I
convinced myself that while ptep_get(); pte_clear_uffd_wp(); set_ptes(); is a
troubling pattern, it is safe in this instance because the page is
write-protected so the HW can't race to set the dirty bit.

The code in question is:

	if (userfaultfd_pte_wp(vma, ptep_get(vmf->pte))) {
		if (!userfaultfd_wp_async(vma)) {
			pte_unmap_unlock(vmf->pte, vmf->ptl);
			return handle_userfault(vmf, VM_UFFD_WP);
		}

		/*
		 * Nothing needed (cache flush, TLB invalidations,
		 * etc.) because we're only removing the uffd-wp bit,
		 * which is completely invisible to the user.
		 */
		pte = pte_clear_uffd_wp(ptep_get(vmf->pte));

		set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
		/*
		 * Update this to be prepared for following up CoW
		 * handling
		 */
		vmf->orig_pte = pte;
	}

Perhaps we should consider a change to the following style as a cleanup?

	old_pte = ptep_modify_prot_start(vma, addr, pte);
	ptent = pte_clear_uffd_wp(old_pte);
	ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);

Regardless, this patch is still a correct and valuable change; arm64 arch
doesn't care if SW bits are modified in valid mappings so we shouldn't be
checking for it.

> 
>>
>>  	/* creating or taking down mappings is always safe */
>>  	if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
>> --
>> 2.43.0
>>
> 
> When looking at this function I found this and caught my attention too:
> 
> 	/* live contiguous mappings may not be manipulated at all */
> 	if ((old | new) & PTE_CONT)
> 		return false;
> 
> I'm now wondering how cont-ptes work with uffd-wp now for arm64, from
> either hugetlb or mTHP pov.  This check may be relevant here as a start.

When transitioning a block of ptes between cont and non-cont, we transition the
block through invalid with tlb invalidation. See contpte_convert().

> 
> The other thing is since x86 doesn't have cont-ptes yet, uffd-wp didn't
> consider that, and there may be things overlooked at least from my side.
> E.g., consider wr-protect one cont-pte huge pages on hugetlb:
> 
> static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
> {
> 	return huge_pte_wrprotect(pte_mkuffd_wp(pte));
> }
> 
> I think it means so far it won't touch the rest cont-ptes but the 1st.  Not
> sure whether it'll work if write happens on the rest.

I'm not completely sure I follow your point. I think this should work correctly.
The arm64 huge_pte code knows what size (and level) the huge pte is and spreads
the passed in pte across all the HW ptes.

> 
> For mTHPs, they should still be done in change_pte_range() which doesn't
> understand mTHPs yet, so it should loop over all ptes and looks good so
> far, but I didn't further check other than that.

For mTHP, it will JustWork (TM). PTEs are exposed to core-mm with the same
semantics they had before; they all appear independent. The code dertermines
when it needs to apply or remove PTE_CONT bit, and in that case, the block is
transitioned through invalid state + tlbi. See contpte_try_fold() and
contpte_try_unfold().

Hope that helps!

Thanks,
Ryan


> 
> Thanks,
> 




More information about the linux-arm-kernel mailing list