[PATCH 0/5] Track node vacancy to reduce worst case allocation counts

Wei Yang richard.weiyang at gmail.com
Tue Nov 19 01:59:51 PST 2024


On Thu, Nov 14, 2024 at 04:39:00PM -0500, Sid Kumar wrote:
>
>On 11/14/24 12:05 PM, Sidhartha Kumar wrote:
[...]
>> ================ results =========================
>> Bpftrace was used to profile the allocation path for requesting new maple
>> nodes while running the ./mmap1_processes test from mmtests. The two paths
>> for allocation are requests for a single node and the bulk allocation path.
>> The histogram represents the number of calls to these paths and a shows the
>> distribution of the number of nodes requested for the bulk allocation path.
>> 
>> 
>> mm-unstable 11/13/24
>> @bulk_alloc_req:
>> [2, 4)                10 |@@@@@@@@@@@@@                                       |
>> [4, 8)                38 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>> [8, 16)               19 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
>> 
>> 
>> mm-unstable 11/13/24 + this series
>> @bulk_alloc_req:
>> [2, 4)                 9 |@@@@@@@@@@                                          |
>> [4, 8)                43 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>> [8, 16)               15 |@@@@@@@@@@@@@@@@@@                                  |
>> 
>> We can see the worst case bulk allocations of [8,16) nodes are reduced after
>> this series.
>
>From running the ./malloc1_threads test case we eliminate almost all bulk
>allocation requests that
>
>fall between 8 and 16 nodes
>
>./malloc1_threads -t 8 -s 100
>mm-unstable + this series
>@bulk_alloc_req:
>[2, 4)                 2 |                                                   
>|
>[4, 8)              3381
>|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>[8, 16)                2 |                                                   
>|
>

This is impressive. But I come up one thing not clear.

For mmap related code, we usually have the following usage:

  vma_iter_prealloc(vmi, vma);
    mas_preallocate(vmi->mas, vma);
      MA_WR_STATE(wr_mas, );
      mas_wr_store_type(&wr_mas);       --- (1)
  vma_iter_store(vmi, vma);

Locaton (1) is where we try to get a better estimation of allocations.
The estimation is based on we walk down the tree and try to meet a proper
node. 

In mmap related code, we usually have already walked down the
tree to leaf, by vma_find() or related iteration operation, and the mas.status
is set to ma_active. To me, I don't expect mas_preallocate() would traverse
the tree again.

But from your result, it seems most cases do traverse the tree again to get a
more precise height.

Which part do you think I have missed?

>
>mm-unstable
>@bulk_alloc_req:
>[2, 4)                 1 |                                                   
>|
>[4, 8)              1427 |@@@@@@@@@@@@@@@@@@@@@@@@@@                         
>|
>[8, 16)             2790
>|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>
>
>> 
>> Sidhartha Kumar (5):
>>    maple_tree: convert mas_prealloc_calc() to take in a maple write state
>>    maple_tree: use height and depth consistently
>>    maple_tree: use vacant nodes to reduce worst case allocations
>>    maple_tree: break on convergence in mas_spanning_rebalance()
>>    maple_tree: add sufficient height
>> 
>>   include/linux/maple_tree.h       |   4 +
>>   lib/maple_tree.c                 |  89 +++++++++++++---------
>>   tools/testing/radix-tree/maple.c | 125 +++++++++++++++++++++++++++++--
>>   3 files changed, 176 insertions(+), 42 deletions(-)
>> 

-- 
Wei Yang
Help you, Help me



More information about the maple-tree mailing list