[PATCH v1] arm64: mm: Permit PTE SW bits to change in live mappings
Ryan Roberts
ryan.roberts at arm.com
Thu Jun 20 03:26:07 PDT 2024
On 19/06/2024 20:04, Peter Xu wrote:
> On Wed, Jun 19, 2024 at 04:58:32PM +0100, Ryan Roberts wrote:
>> The code in question is:
>>
>> if (userfaultfd_pte_wp(vma, ptep_get(vmf->pte))) {
>> if (!userfaultfd_wp_async(vma)) {
>> pte_unmap_unlock(vmf->pte, vmf->ptl);
>> return handle_userfault(vmf, VM_UFFD_WP);
>> }
>>
>> /*
>> * Nothing needed (cache flush, TLB invalidations,
>> * etc.) because we're only removing the uffd-wp bit,
>> * which is completely invisible to the user.
>> */
>> pte = pte_clear_uffd_wp(ptep_get(vmf->pte));
>>
>> set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
>> /*
>> * Update this to be prepared for following up CoW
>> * handling
>> */
>> vmf->orig_pte = pte;
>> }
>>
>> Perhaps we should consider a change to the following style as a cleanup?
>>
>> old_pte = ptep_modify_prot_start(vma, addr, pte);
>> ptent = pte_clear_uffd_wp(old_pte);
>> ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
>
> You're probably right that at least the access bit seems racy to be set
> here, so we may have risk of losing that when a race happened against HW.
> Dirty bit shouldn't be a concern in this case due to missing W bit, iiuc.
>
> IMO it's a matter of whether we'd like to "make access bit 100% accurate"
> when the race happened, while paying that off with an always slower generic
> path. Looks cleaner indeed but maybe not very beneficial in reality.
>
>>
>> Regardless, this patch is still a correct and valuable change; arm64 arch
>> doesn't care if SW bits are modified in valid mappings so we shouldn't be
>> checking for it.
>
> Agreed. Let's keep this discussion separate from the original patch if
> that already fixes stuff.
>
>>
>>>
>>>>
>>>> /* creating or taking down mappings is always safe */
>>>> if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
>>>> --
>>>> 2.43.0
>>>>
>>>
>>> When looking at this function I found this and caught my attention too:
>>>
>>> /* live contiguous mappings may not be manipulated at all */
>>> if ((old | new) & PTE_CONT)
>>> return false;
>>>
>>> I'm now wondering how cont-ptes work with uffd-wp now for arm64, from
>>> either hugetlb or mTHP pov. This check may be relevant here as a start.
>>
>> When transitioning a block of ptes between cont and non-cont, we transition the
>> block through invalid with tlb invalidation. See contpte_convert().
>>
>>>
>>> The other thing is since x86 doesn't have cont-ptes yet, uffd-wp didn't
>>> consider that, and there may be things overlooked at least from my side.
>>> E.g., consider wr-protect one cont-pte huge pages on hugetlb:
>>>
>>> static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
>>> {
>>> return huge_pte_wrprotect(pte_mkuffd_wp(pte));
>>> }
>>>
>>> I think it means so far it won't touch the rest cont-ptes but the 1st. Not
>>> sure whether it'll work if write happens on the rest.
>>
>> I'm not completely sure I follow your point. I think this should work correctly.
>> The arm64 huge_pte code knows what size (and level) the huge pte is and spreads
>> the passed in pte across all the HW ptes.
>
> What I was considering is about wr-protect a 64K cont-pte entry in arm64:
>
> UFFDIO_WRITEPROTECT -> hugetlb_change_protection() -> huge_pte_mkuffd_wp()
>
> What I'm expecting is huge_pte_mkuffd_wp() would wr-protect all ptes,
Yes, I think this works as expected. huge_pte_mkuffd_wp() is just modifying the
bits in a pte passed on the stack and returns the modified pte. The pgtable is
not touched at this point. The magic happens in set_huge_pte_at() (called by
huge_ptep_modify_prot_commit() in your case), which knows how the abstract "huge
pte" maps to real PMDs or PTEs in the pgtable and applies the passed in pte
value appropriately to all real pmds/ptes (adjusting the pfn as required in the
process).
> but
> looks not right now. I'm not sure if the HW is able to identify "the whole
> 64K is wr-protected" in this case, rather than "only the 1st pte is
> wr-protected", as IIUC current "pte" points to only the 1st pte entry.
I believe this works as you intended; there is no bug as far as I can see.
>
> Thanks,
>
More information about the linux-arm-kernel
mailing list