[RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory

David Hildenbrand david at redhat.com
Wed Apr 19 03:51:02 PDT 2023


> I'm looking to fix this problem in my code, but am struggling to see how the
> current code is safe. I'm thinking about the following scenario:
> 

Let's see :)

>   - A page is CoW mapped into processes A and B.
>   - The page takes a fault in process A, and do_wp_page() determines that it is
>     "maybe-shared" and therefore must copy. So drops the PTL and calls
>     wp_page_copy().

Note that before calling wp_page_copy(), we do a folio_get(folio); 
Further, the page table reference is only dropped once we actually 
replace the page in the page table. So while in wp_page_copy(), the 
folio should have at least 2 references if the page is still mapped.

>   - Process B exits.
>   - Another thread in process A faults on the page. This time dw_wp_page()
>     determines that the page is exclusive (due to the ref count), and reuses it,
>     marking it exclusive along the way.

The refcount should not be 1 (other reference from the wp_page_copy() 
caller), so A won't be able to reuse it, and ...

>   - wp_page_copy() from the original thread in process A retakes the PTL and
>     copies the _now exclusive_ page.
> 
> Having typed it up, I guess this can't happen, because wp_page_copy() will only
> do the copy if the PTE hasn't changed and it will have changed because it is now
> writable? So this is safe?

this applies as well. If the pte changed (when reusing due to a write 
failt it's now writable, or someone else broke COW), we back off. For 
FAULT_FLAG_UNSHARE, however, the PTE may not change. But the additional 
reference should make it work.

I think it works as intended. It would be clearer if we'd also recheck 
in wp_page_copy() whether we still don't have an exclusive anon page 
under PT lock --  and if we would, back off.

> 
> To make things more convoluted, what happens if the second thread does an
> mprotect() to make the page RO after its write fault was handled? I think
> mprotect() will serialize on the mmap write lock so this is safe too?

Yes, mprotect() synchronizes that. There are other mechanisms to 
write-protect a page, though, under mmap lock in read mode (uffd-wp). So 
it's a valid concern.

In all of these cases, reuse should be prevented due to the additional 
reference on the folio when entering wp_page_copy() right from the 
start, not turning the page exclusive but instead replacing it by a 
copy. An additional sanity check sounds like the right thing to do.

> 
> Sorry if this is a bit rambly, just trying to make sure I've understood
> everything correctly.

It's a very interesting corner case, thanks for bringing that up. I 
think the old mapcount based approach could have suffered from this 
theoretical issue, but I might be wrong.

-- 
Thanks,

David / dhildenb




More information about the linux-arm-kernel mailing list