[PATCH v1 09/11] mm/memory: optimize fork() with PTE-mapped THP
Ryan Roberts
ryan.roberts at arm.com
Tue Jan 23 04:28:25 PST 2024
On 23/01/2024 12:19, David Hildenbrand wrote:
> [...]
>
>>
>> I wrote some documentation for this (based on Matthew's docs for set_ptes() in
>> my version. Perhaps it makes sense to add it here, given this is overridable by
>> the arch.
>>
>> /**
>> * wrprotect_ptes - Write protect a consecutive set of pages.
>> * @mm: Address space that the pages are mapped into.
>> * @addr: Address of first page to write protect.
>> * @ptep: Page table pointer for the first entry.
>> * @nr: Number of pages to write protect.
>> *
>> * May be overridden by the architecture, else implemented as a loop over
>> * ptep_set_wrprotect().
>> *
>> * Context: The caller holds the page table lock. The PTEs are all in the same
>> * PMD.
>> */
>>
>
> I could have sworn I had a documentation at some point. Let me add some, thanks.
>
> [...]
>
>>> +
>>> + /*
>>> + * If we likely have to copy, just don't bother with batching. Make
>>> + * sure that the common "small folio" case stays as fast as possible
>>> + * by keeping the batching logic separate.
>>> + */
>>> + if (unlikely(!*prealloc && folio_test_large(folio) && max_nr != 1)) {
>>> + nr = folio_pte_batch(folio, addr, src_pte, pte, max_nr);
>>> + if (folio_test_anon(folio)) {
>>> + folio_ref_add(folio, nr);
>>> + if (unlikely(folio_try_dup_anon_rmap_ptes(folio, page,
>>> + nr, src_vma))) {
>>
>> What happens if its not the first page of the batch that fails here? Aren't you
>> signalling that you need a prealloc'ed page for the wrong pte? Shouldn't you
>> still batch copy all the way up to the failing page first? Perhaps it all comes
>> out in the wash and these events are so infrequent that we don't care about the
>> lost batching opportunity?
>
> I assume you mean the weird corner case that some folio pages in the range have
> PAE set, others don't -- and the folio maybe pinned.
>
> In that case, we fallback to individual pages, and might have preallocated a
> page although we wouldn't have to preallocate one for processing the next page
> (that doesn't have PAE set).
>
> It should all work, although not optimized to the extreme, and as it's a corner
> case, we don't particularly care. Hopefully, in the future we'll only have a
> single PAE flag per folio.
>
> Or am I missing something?
No, your explanation makes sense. Just wanted to check this all definitely
worked, because the flow is slightly different to my previous version that was
doing try_dup_rmap page-by-page.
>
>>
>>> + folio_ref_sub(folio, nr);
>>> + return -EAGAIN;
>>> + }
>>> + rss[MM_ANONPAGES] += nr;
>>> + VM_WARN_ON_FOLIO(PageAnonExclusive(page), folio);
>>> + } else {
>>> + folio_ref_add(folio, nr);
>>
>> Perhaps hoist this out to immediately after folio_pte_batch() since you're
>> calling it on both branches?
>
> Makes sense, thanks.
>
More information about the linux-riscv
mailing list