[RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory
David Hildenbrand
david at redhat.com
Wed Apr 19 03:51:02 PDT 2023
> I'm looking to fix this problem in my code, but am struggling to see how the
> current code is safe. I'm thinking about the following scenario:
>
Let's see :)
> - A page is CoW mapped into processes A and B.
> - The page takes a fault in process A, and do_wp_page() determines that it is
> "maybe-shared" and therefore must copy. So drops the PTL and calls
> wp_page_copy().
Note that before calling wp_page_copy(), we do a folio_get(folio);
Further, the page table reference is only dropped once we actually
replace the page in the page table. So while in wp_page_copy(), the
folio should have at least 2 references if the page is still mapped.
> - Process B exits.
> - Another thread in process A faults on the page. This time dw_wp_page()
> determines that the page is exclusive (due to the ref count), and reuses it,
> marking it exclusive along the way.
The refcount should not be 1 (other reference from the wp_page_copy()
caller), so A won't be able to reuse it, and ...
> - wp_page_copy() from the original thread in process A retakes the PTL and
> copies the _now exclusive_ page.
>
> Having typed it up, I guess this can't happen, because wp_page_copy() will only
> do the copy if the PTE hasn't changed and it will have changed because it is now
> writable? So this is safe?
this applies as well. If the pte changed (when reusing due to a write
failt it's now writable, or someone else broke COW), we back off. For
FAULT_FLAG_UNSHARE, however, the PTE may not change. But the additional
reference should make it work.
I think it works as intended. It would be clearer if we'd also recheck
in wp_page_copy() whether we still don't have an exclusive anon page
under PT lock -- and if we would, back off.
>
> To make things more convoluted, what happens if the second thread does an
> mprotect() to make the page RO after its write fault was handled? I think
> mprotect() will serialize on the mmap write lock so this is safe too?
Yes, mprotect() synchronizes that. There are other mechanisms to
write-protect a page, though, under mmap lock in read mode (uffd-wp). So
it's a valid concern.
In all of these cases, reuse should be prevented due to the additional
reference on the folio when entering wp_page_copy() right from the
start, not turning the page exclusive but instead replacing it by a
copy. An additional sanity check sounds like the right thing to do.
>
> Sorry if this is a bit rambly, just trying to make sure I've understood
> everything correctly.
It's a very interesting corner case, thanks for bringing that up. I
think the old mapcount based approach could have suffered from this
theoretical issue, but I might be wrong.
--
Thanks,
David / dhildenb
More information about the linux-arm-kernel
mailing list