Does mtree_lookup_walk need to use ma_data_end?

Liam R. Howlett Liam.Howlett at Oracle.com
Mon Oct 30 08:31:01 PDT 2023


* Omar Sandoval <osandov at osandov.com> [231027 16:43]:
> On Fri, Oct 27, 2023 at 10:25:53AM -0400, Liam R. Howlett wrote:
> > * Omar Sandoval <osandov at osandov.com> [231026 20:16]:
> > > Hi,
> > > 
> > > I'm reading some of the maple tree code in order to write drgn helpers
> > > for maple trees, and I was hoping for some clarification on one bit of
> > > code. Specifically, in mtree_lookup_walk:
> > > 
> > > 	end = ma_data_end(node, type, pivots, max);
> > > 	...
> > > 	do {
> > > 		if (pivots[offset] >= mas->index) {
> > > 			max = pivots[offset];
> > > 			break;
> > > 		}
> > > 	} while (++offset < end);
> > > 
> > > So this is looping until the first pivot greater than or equal to the
> > > desired index, stopping after it has checked the last pivot.
> > > 
> > > But we're guaranteed that either the last pivot or the implied maximum
> > > is greater than or equal to the index, so is it necessary to get the
> > > actual end with ma_data_end? I.e., would the following be equivalent?
> > > 
> > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> > > index bb24d84a4922..18297c6ed06f 100644
> > > --- a/lib/maple_tree.c
> > > +++ b/lib/maple_tree.c
> > > @@ -3739,7 +3739,7 @@ static inline void *mtree_lookup_walk(struct ma_state *mas)
> > >  		node = mte_to_node(next);
> > >  		type = mte_node_type(next);
> > >  		pivots = ma_pivots(node, type);
> > > -		end = ma_data_end(node, type, pivots, max);
> > > +		end = mt_pivots[type];
> > >  		if (unlikely(ma_dead_node(node)))
> > >  			goto dead_node;
> > >  		do {
> > > 
> > > I have no idea whether this is any better or faster. I guess it does
> > > avoid a few branches and loads in ma_data_end. I'm mainly wondering
> > > whether there's something I'm missing in my understanding.
> > 
> > Your analysis looks correct.  This is probably the case elsewhere in
> > recent code as well (mas_range_walk(), probably).
> 
> Thanks for humoring me! I have another, unrelated question. I wrote some
> test code to populate a tree like:
> 
> 	DEFINE_MTREE(my_mt);
> 	for (i = 0; i < MAPLE_RANGE64_SLOTS * 2 - 1; i++)
> 		mtree_insert(&my_mt, i, xa_mk_value(i), GFP_KERNEL);
> 
> This resulted in a 2-level tree with 3 leaves:
> 
> 	Root pivots:   [14, 23, ULONG_MAX]
> 	Leaf 0 pivots: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
> 	Leaf 0 slots:  [xa_mk_value(0),
> 			xa_mk_value(1),
> 			xa_mk_value(2),
> 			xa_mk_value(3),
> 			xa_mk_value(4),
> 			xa_mk_value(5),
> 			xa_mk_value(6),
> 			xa_mk_value(7),
> 			xa_mk_value(8),
> 			xa_mk_value(9),
> 			xa_mk_value(10),
> 			xa_mk_value(11),
> 			xa_mk_value(12),
> 			xa_mk_value(13),
> 			xa_mk_value(14)]
> 	Leaf 1 pivots: [15, 16, 17, 18, 19, 20, 21, 22, 23]
> 	Leaf 1 slots:  [xa_mk_value(15),
> 			xa_mk_value(16),
> 			xa_mk_value(17),
> 			xa_mk_value(18),
> 			xa_mk_value(19),
> 			xa_mk_value(20),
> 			xa_mk_value(21),
> 			xa_mk_value(22),
> 			xa_mk_value(23)]
> 	Leaf 2 pivots: [24, 25, 26, 27, 28, 29, 30, ULONG_MAX]
> 	Leaf 2 slots:  [xa_mk_value(24),
> 			xa_mk_value(25),
> 			xa_mk_value(26),
> 			xa_mk_value(27),
> 			xa_mk_value(28),
> 			xa_mk_value(29),
> 			xa_mk_value(30),
> 			0]
> 
> I believe the following tree with only two leaves (by taking advantage
> of the implied minimum/maximum) is equivalent:
> 
> 	Root pivots:   [15, ULONG_MAX]
> 	Leaf 0 pivots: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
> 	Leaf 0 slots:  [xa_mk_value(0),
> 			xa_mk_value(1),
> 			xa_mk_value(2),
> 			xa_mk_value(3),
> 			xa_mk_value(4),
> 			xa_mk_value(5),
> 			xa_mk_value(6),
> 			xa_mk_value(7),
> 			xa_mk_value(8),
> 			xa_mk_value(9),
> 			xa_mk_value(10),
> 			xa_mk_value(11),
> 			xa_mk_value(12),
> 			xa_mk_value(13),
> 			xa_mk_value(14),
> 			xa_mk_value(15)]
> 	Leaf 1 pivots: [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
> 	Leaf 1 slots:  [xa_mk_value(16),
> 			xa_mk_value(17),
> 			xa_mk_value(18),
> 			xa_mk_value(19),
> 			xa_mk_value(20),
> 			xa_mk_value(21),
> 			xa_mk_value(22),
> 			xa_mk_value(23),
> 	                xa_mk_value(24),
> 			xa_mk_value(25),
> 			xa_mk_value(26),
> 			xa_mk_value(27),
> 			xa_mk_value(28),
> 			xa_mk_value(29),
> 			xa_mk_value(30),
> 			0]
> 
> Did I get something wrong here? Or is it intentional not to do this, or
> maybe not worth the complexity?

In our testing, it has proven beneficial to leave some areas for
expansion/contraction.  In your example this is not possible due to the
linear layout of your values, and this will be addressed with dense
nodes in the future.

In the range storage case, the dense layout above leads to 'jittering'
in many cases.  We end up with 2 full nodes and someone adds one more
entry, then removes one, then adds one, etc.  To avoid this jitter, we
don't always fill nodes - the splitting/combining isn't a strict limit,
but allows for space.

We also re-balance between the nodes and its peers to avoid jittering up
the tree.

Thanks,
Liam




More information about the maple-tree mailing list