[RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory

Ryan Roberts ryan.roberts at arm.com
Wed Apr 19 04:13:07 PDT 2023


On 19/04/2023 11:51, David Hildenbrand wrote:
>> I'm looking to fix this problem in my code, but am struggling to see how the
>> current code is safe. I'm thinking about the following scenario:
>>
> 
> Let's see :)
> 
>>   - A page is CoW mapped into processes A and B.
>>   - The page takes a fault in process A, and do_wp_page() determines that it is
>>     "maybe-shared" and therefore must copy. So drops the PTL and calls
>>     wp_page_copy().
> 
> Note that before calling wp_page_copy(), we do a folio_get(folio); Further, the
> page table reference is only dropped once we actually replace the page in the
> page table. So while in wp_page_copy(), the folio should have at least 2
> references if the page is still mapped.

Ahh, yes, of course!

> 
>>   - Process B exits.
>>   - Another thread in process A faults on the page. This time dw_wp_page()
>>     determines that the page is exclusive (due to the ref count), and reuses it,
>>     marking it exclusive along the way.
> 
> The refcount should not be 1 (other reference from the wp_page_copy() caller),
> so A won't be able to reuse it, and ...
> 
>>   - wp_page_copy() from the original thread in process A retakes the PTL and
>>     copies the _now exclusive_ page.
>>
>> Having typed it up, I guess this can't happen, because wp_page_copy() will only
>> do the copy if the PTE hasn't changed and it will have changed because it is now
>> writable? So this is safe?
> 
> this applies as well. If the pte changed (when reusing due to a write failt it's
> now writable, or someone else broke COW), we back off. For FAULT_FLAG_UNSHARE,
> however, the PTE may not change. But the additional reference should make it work.
> 
> I think it works as intended. It would be clearer if we'd also recheck in
> wp_page_copy() whether we still don't have an exclusive anon page under PT lock
> --  and if we would, back off.

Yes, agreed. I now have a fix for my code that adds a check during the PTE scan
that each page is non-exclusive. And the scan is (already) done once without the
PTL to determine the size of destination folio, and then again with the lock to
ensure there was no race. I'm not seeing any change in the relative counts of
folio orders (which is expected due to the case being rare).

Thanks!



More information about the linux-arm-kernel mailing list