[PATCH v1] arm64: mm: Permit PTE SW bits to change in live mappings

Ryan Roberts ryan.roberts at arm.com
Thu Jun 20 03:26:07 PDT 2024


On 19/06/2024 20:04, Peter Xu wrote:
> On Wed, Jun 19, 2024 at 04:58:32PM +0100, Ryan Roberts wrote:
>> The code in question is:
>>
>> 	if (userfaultfd_pte_wp(vma, ptep_get(vmf->pte))) {
>> 		if (!userfaultfd_wp_async(vma)) {
>> 			pte_unmap_unlock(vmf->pte, vmf->ptl);
>> 			return handle_userfault(vmf, VM_UFFD_WP);
>> 		}
>>
>> 		/*
>> 		 * Nothing needed (cache flush, TLB invalidations,
>> 		 * etc.) because we're only removing the uffd-wp bit,
>> 		 * which is completely invisible to the user.
>> 		 */
>> 		pte = pte_clear_uffd_wp(ptep_get(vmf->pte));
>>
>> 		set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
>> 		/*
>> 		 * Update this to be prepared for following up CoW
>> 		 * handling
>> 		 */
>> 		vmf->orig_pte = pte;
>> 	}
>>
>> Perhaps we should consider a change to the following style as a cleanup?
>>
>> 	old_pte = ptep_modify_prot_start(vma, addr, pte);
>> 	ptent = pte_clear_uffd_wp(old_pte);
>> 	ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
> 
> You're probably right that at least the access bit seems racy to be set
> here, so we may have risk of losing that when a race happened against HW.
> Dirty bit shouldn't be a concern in this case due to missing W bit, iiuc.
> 
> IMO it's a matter of whether we'd like to "make access bit 100% accurate"
> when the race happened, while paying that off with an always slower generic
> path.  Looks cleaner indeed but maybe not very beneficial in reality.
> 
>>
>> Regardless, this patch is still a correct and valuable change; arm64 arch
>> doesn't care if SW bits are modified in valid mappings so we shouldn't be
>> checking for it.
> 
> Agreed.  Let's keep this discussion separate from the original patch if
> that already fixes stuff.
> 
>>
>>>
>>>>
>>>>  	/* creating or taking down mappings is always safe */
>>>>  	if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
>>>> --
>>>> 2.43.0
>>>>
>>>
>>> When looking at this function I found this and caught my attention too:
>>>
>>> 	/* live contiguous mappings may not be manipulated at all */
>>> 	if ((old | new) & PTE_CONT)
>>> 		return false;
>>>
>>> I'm now wondering how cont-ptes work with uffd-wp now for arm64, from
>>> either hugetlb or mTHP pov.  This check may be relevant here as a start.
>>
>> When transitioning a block of ptes between cont and non-cont, we transition the
>> block through invalid with tlb invalidation. See contpte_convert().
>>
>>>
>>> The other thing is since x86 doesn't have cont-ptes yet, uffd-wp didn't
>>> consider that, and there may be things overlooked at least from my side.
>>> E.g., consider wr-protect one cont-pte huge pages on hugetlb:
>>>
>>> static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
>>> {
>>> 	return huge_pte_wrprotect(pte_mkuffd_wp(pte));
>>> }
>>>
>>> I think it means so far it won't touch the rest cont-ptes but the 1st.  Not
>>> sure whether it'll work if write happens on the rest.
>>
>> I'm not completely sure I follow your point. I think this should work correctly.
>> The arm64 huge_pte code knows what size (and level) the huge pte is and spreads
>> the passed in pte across all the HW ptes.
> 
> What I was considering is about wr-protect a 64K cont-pte entry in arm64:
> 
>   UFFDIO_WRITEPROTECT -> hugetlb_change_protection() -> huge_pte_mkuffd_wp()
> 
> What I'm expecting is huge_pte_mkuffd_wp() would wr-protect all ptes, 

Yes, I think this works as expected. huge_pte_mkuffd_wp() is just modifying the
bits in a pte passed on the stack and returns the modified pte. The pgtable is
not touched at this point. The magic happens in set_huge_pte_at() (called by
huge_ptep_modify_prot_commit() in your case), which knows how the abstract "huge
pte" maps to real PMDs or PTEs in the pgtable and applies the passed in pte
value appropriately to all real pmds/ptes (adjusting the pfn as required in the
process).

> but
> looks not right now.  I'm not sure if the HW is able to identify "the whole
> 64K is wr-protected" in this case, rather than "only the 1st pte is
> wr-protected", as IIUC current "pte" points to only the 1st pte entry.

I believe this works as you intended; there is no bug as far as I can see.

> 
> Thanks,
> 




More information about the linux-arm-kernel mailing list