[RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory

Mon Apr 17 09:15:16 PDT 2023

On 17/04/2023 16:44, David Hildenbrand wrote:
>>>>
>>>>>
>>>>>
>>>>> So what should be safe is replacing all sub-pages of a folio that are marked
>>>>> "maybe shared" by a new folio under PT lock. However, I wonder if it's really
>>>>> worth the complexity. For THP we were happy so far to *not* optimize this,
>>>>> implying that maybe we shouldn't worry about optimizing the fork() case for
>>>>> now
>>>>> that heavily.
>>>>
>>>> I don't have the exact numbers to hand, but I'm pretty sure I remember enabling
>>>> large copies was contributing a measurable amount to the performance
>>>> improvement. (Certainly, the zero-page copy case, is definitely a big
>>>> contributer). I don't have access to the HW at the moment but can rerun later
>>>> with and without to double check.
>>>
>>> In which test exactly? Some micro-benchmark?
>>
>> The kernel compile benchmark that I quoted numbers for in the cover letter. I
>> have some trace points (not part of the submitted series) that tell me how many
>> mappings of each order we get for each code path. I'm pretty sure I remember all
>> of these 4 code paths contributing non-negligible amounts.
> 
> Interesting! It would be great to see if there is an actual difference after
> patch #10 was applied without the other COW replacement.
> 

I'll aim to get some formal numbers when I next have access to the HW.