[RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory

Ryan Roberts ryan.roberts at arm.com
Wed Apr 19 03:12:09 PDT 2023


Hi David,

On 17/04/2023 15:05, David Hildenbrand wrote:
> [...]
> 
>>> Just a note (that you maybe already know) that we have to be a bit careful in
>>> the wp_copy path with replacing sub-pages that are marked exclusive.
>>
>> Ahh, no I wasn't aware of this - thanks for taking the time to explain it. I
>> think I have a bug.
>>
>> (I'm guessing the GUP fast path assumes that if it sees an exclusive page then
>> that page can't go away? And if I then wp_copy it, I'm breaking the assumption?
>> But surely user space can munmap it at any time and the consequences are
>> similar? It's probably clear that I don't know much about the GUP implementation
>> details...)
> 
> If GUP finds a read-only PTE pointing at an exclusive subpage, it assumes that
> this page cannot randomly be replaced by core MM due to COW. See
> gup_must_unshare(). So it can go ahead and pin the page. As long as user space
> doesn't do something stupid with the mapping (MADV_DONTNEED, munmap()) the
> pinned page must correspond to the mapped page.
> 
> If GUP finds a writeable PTE, it assumes that this page cannot randomly be
> replaced by core MM due to COW -- because writable implies exclusive. See, for
> example the VM_BUG_ON_PAGE() in follow_page_pte(). So, similarly, GUP can simply
> go ahead and pin the page.
> 
> GUP-fast runs lockless, not even taking the PT locks. It syncs against
> concurrent fork() using a special seqlock, and essentially unpins whatever it
> temporarily pinned when it detects that fork() was running concurrently. But it
> might result in some pages temporarily being flagged as "maybe pinned".
> 
> In other cases (!fork()), GUP-fast synchronizes against concurrent sharing (KSM)
> or unmapping (migration, swapout) that implies clearing of the PG_anon_flag of
> the subpage by first unmapping the PTE and conditionally remapping it. See
> mm/ksm.c:write_protect_page() as an example for the sharing side (especially: if
> page_try_share_anon_rmap() fails because the page may be pinned).
> 
> Long story short: replacing a r-o "maybe shared" (!exclusive) PTE is easy.
> Replacing an exclusive PTE (including writable PTEs) requires some work to sync
> with GUP-fast and goes rather in the "maybe just don't bother" terriroty.

I'm looking to fix this problem in my code, but am struggling to see how the
current code is safe. I'm thinking about the following scenario:

 - A page is CoW mapped into processes A and B.
 - The page takes a fault in process A, and do_wp_page() determines that it is
   "maybe-shared" and therefore must copy. So drops the PTL and calls
   wp_page_copy().
 - Process B exits.
 - Another thread in process A faults on the page. This time dw_wp_page()
   determines that the page is exclusive (due to the ref count), and reuses it,
   marking it exclusive along the way.
 - wp_page_copy() from the original thread in process A retakes the PTL and
   copies the _now exclusive_ page.

Having typed it up, I guess this can't happen, because wp_page_copy() will only
do the copy if the PTE hasn't changed and it will have changed because it is now
writable? So this is safe?

To make things more convoluted, what happens if the second thread does an
mprotect() to make the page RO after its write fault was handled? I think
mprotect() will serialize on the mmap write lock so this is safe too?

Sorry if this is a bit rambly, just trying to make sure I've understood
everything correctly.

Thanks,
Ryan




More information about the linux-arm-kernel mailing list