[PATCH 3/4] maple_tree: use the correct min to calculate split

Liam R. Howlett Liam.Howlett at oracle.com
Thu Nov 7 19:19:37 PST 2024


* Wei Yang <richard.weiyang at gmail.com> [241107 21:55]:
> On Sun, Oct 20, 2024 at 06:00:35PM -0400, Liam R. Howlett wrote:
> >* Wei Yang <richard.weiyang at gmail.com> [241019 22:46]:
> >> The check in mab_calc_split() is to make sure the gap between
> >> [0, split] won't be too small. But we don't pass the correct min.
> >> 
> >> First we may encounter a pivot[split] smaller than min. For example:
> >> 
> >>        mt_init_flags(mt, 0);
> >>        for (count = 0; count <= 240; count++) {
> >>                mas_set(&mas, count);
> >>                mas_store(&mas, xa_mk_value(count));
> >>        }
> >> 
> >> On splitting root for storing 240, the pivot[split] is smaller than min.
> >> This result a huge (pivot[split] - min).
> >
> >This is fine.
> >
> >There is an open work item to make it more accurate at higher levels,
> >but it's not a problem as it is.
> >
> >Each level upwards needs a better 'minimum span', meaning that the node
> >should have at least mas.min - mas.min + level * something.
> 
> Don't follow this. 
> 
> First, I guess it is mas.max - mas.min?
> 

No, the idea is that we could do better if we could keep the data more
densely packed.

If we have singletons (all range of 1), and we have 10,000 singletons,
then a parent node with 0 - 128 and a sibling of 129-257 makes no sense,
there is nothing that can be inserting into the parents last two slots
- we can't split singletons smaller so we should have done a better job
at splitting.

If you think of how this grows today, we split then rebalance until the
left node is full and we have to split again.  This could be better
optimised to avoid multiple rebalances by doing something to push more
data to the left based on the span of the children and the nodes.  But
we'd want to keep enough so some modifications to each node doesn't
cause a rebalance in the opposite direction.

So a minimum span of parent nodes one level up from the leaves would be
16 * 10 = 160, so when we split we can push more to the left to try and
get to 160 while keeping the right node sufficient.  The math of the
span would change based on how far up the tree we go, but would be
easier to maintain.

I'd probably still want to split it 50/50 the first time, but it might
make sense to push more on the next rebalance.  so 10 => 6|5, 6|10 =>
9|5, but maybe it's not worth the effort.  I'm starting to think so,
considering it's been ineffective to date.  There is a chance that the
active area gets moved to the more-full node and ends up splitting
sooner anyways..

All of this will probably change when dense nodes come anyways.

> Then there is sth related to level. But I don't see level is used for
> calculation. Would you mind sharing more your original thought?
> 
> >
> >It works today for leaves, somewhat.
> >
> 
> To me it works by coincidence.

Seems like it.

> 
> The condition check is:
> 
>   (bn->pivot[split] - min) < (slot_count - 1) 
> 
> while slot_count is a constent, 16 for leaf node.
> 
> And we always pass 0 as min, the condition for leaf is:
> 
>   bn->pivot[split] < 15
> 
> For the mmap world, per my understanding, we won't expect an address is smaller
> than 15.

There are other users, but I think we should just drop this attempt at
fancy stuff.

> 
> >> 
> >> Second prev_l_mas.min is not initialized for the first iteration. This
> >> means on splitting leaf node, this value is mostly taking no effect.
> >
> >No, it is set to 0.  Not initialized would cause random data loss.
> >
> >See MA_STATE() in the header.
> >
> 
> Right, I want to say not initialized to the value we want.
> 
> Will be more careful next time.
> 
> >> 
> >> Since we are now calculating the split of mas->node, we should use the
> >> mas->min instead of prev_l_mas.min.
> >
> >This sounds reasonable, but considering what this number is used for, I
> >don't see how it is working as well as it is today.  I will need to look
> >deeper into this.
> >
> >> 
> >> Signed-off-by: Wei Yang <richard.weiyang at gmail.com>
> >> CC: Liam R. Howlett <Liam.Howlett at Oracle.com>
> >> CC: Sidhartha Kumar <sidhartha.kumar at oracle.com>
> >> CC: Lorenzo Stoakes <lorenzo.stoakes at oracle.com>
> >> ---
> >>  lib/maple_tree.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> >> index 894dc5e9436e..c2d4b188646c 100644
> >> --- a/lib/maple_tree.c
> >> +++ b/lib/maple_tree.c
> >> @@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
> >>  		if (mas_push_data(mas, height, &mast, false))
> >>  			break;
> >>  
> >> -		split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
> >> +		split = mab_calc_split(mas, b_node, &mid_split, mas->min);
> >>  		mast_split_data(&mast, mas, split);
> >>  		/*
> >>  		 * Usually correct, mab_mas_cp in the above call overwrites
> >> -- 
> >> 2.34.1
> >> 
> 
> -- 
> Wei Yang
> Help you, Help me



More information about the maple-tree mailing list