[PATCH v2 6/7] mm: Batch around can_change_pte_writable()

Dev Jain dev.jain at arm.com
Tue May 6 02:16:59 PDT 2025



On 29/04/25 2:57 pm, David Hildenbrand wrote:
> On 29.04.25 11:19, David Hildenbrand wrote:
>>
>>>    #include "internal.h"
>>> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned 
>>> long addr,
>>> -                 pte_t pte)
>>> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned 
>>> long addr,
>>> +                  pte_t pte, struct folio *folio, unsigned int nr)
>>>    {
>>>        struct page *page;
>>> @@ -67,8 +67,9 @@ bool can_change_pte_writable(struct vm_area_struct 
>>> *vma, unsigned long addr,
>>>             * write-fault handler similarly would map them writable 
>>> without
>>>             * any additional checks while holding the PT lock.
>>>             */
>>> -        page = vm_normal_page(vma, addr, pte);
>>> -        return page && PageAnon(page) && PageAnonExclusive(page);
>>> +        if (!folio)
>>> +            folio = vm_normal_folio(vma, addr, pte);
>>> +        return folio_test_anon(folio) && ! 
>>> folio_maybe_mapped_shared(folio);
>>
>> Oh no, now I spot it. That is horribly wrong.
>>
>> Please understand first what you are doing.
> 
> Also, would expect that the cow.c selftest would catch that:
> 
> "vmsplice() + unmap in child with mprotect() optimization"
> 
> After fork() we have a R/O PTE in the parent. Our child then uses 
> vmsplice() and unmaps the R/O PTE, meaning it is only left mapped by the 
> parent.
> 
> ret = mprotect(mem, size, PROT_READ);
> ret |= mprotect(mem, size, PROT_READ|PROT_WRITE);
> 
> should turn the PTE writable, although it shouldn't.
> 
> If that test case does not detect the issue you're introducing, we 
> should look into adding a test case that detects it.
> 

Hi David, I am afraid I don't understand my mistake :( PageAnon(page) 
boils down to folio_test_anon(folio). Next we want to determine whether 
the page underlying a PTE is mapped exclusively or not. I approximate 
this by folio_maybe_mapped_shared -> if the folio => all pages are 
mapped exclusively, then I convert the entire batch to writable. If one 
of the pages is mapped shared, then I do not convert the batch to 
writable, thus missing out on the optimization. As far as I understand,
the test failure points out exactly this right?

Do you suggest an alternate way? My initial approach was to add a new 
flag to folio_pte_batch: FPB_IGNORE_ANON_EXCLUSIVE, but from an API 
design PoV Ryan pointed out that that looked bad.




More information about the linux-arm-kernel mailing list