Maple Tree Work

Liam R. Howlett Liam.Howlett at Oracle.com
Fri Jul 7 09:38:15 PDT 2023


Hello,

I have received enquiries about the status of features for the maple
tree as well as requests for work.  I've decided to track this on the
maple tree mailing list [1] for better coordination and open discussion.

Please keep all discussions on the maple tree mailing list.

I'll send out an updated list monthly with names against people working
on them to avoid work duplication, so don't work on something with a
name next to it.

If you want to work on something then please respond to this email so
everyone knows and I can add your name to the item.  Feel free to ask
questions to clarify the feature or discuss directions.

Likewise, if you cannot work on anything anymore then let me know so
others can.

If you have any ideas, then please email the list.  We can discuss it
and I can add it to the list.

Please sign up to the maple tree mailing list [1] to get these emails.

Features:
 - Better preallocation calculations - Liam R. Howlett 2023/07/07
 	Without tracking the tree status on the walk down, we can
	partially walk down to figure out the type of write and which
	'worst case' a write will cause.  Using this "write type"
	information, the preallocations can be a better estimate. v2 was
	sent to the mailing list [2].

 - mas_store_gfp() align with mas_prealloc()
	Right now mas_store_gfp() is going to allocate once it figures
	out what to do and it does the worst case allocations.  This is
	inefficient and can easily be done better by doing more of what
	mas_prealloc()/mas_store_prealloc() does - or at least will be
	doing once the 'Better preallocation calculations' lands.

 - Tracking tree status on walks down
 	Track in the maple state when the last minimum nodes, space for
	2, or space for 3 slots occurred (height in the tree).  This
	will allow for a better estimate on how many nodes are needed
	for a write.  This can be complicated on next/prev walking, etc.
	It has to be done in read mode since some walk may have been
	done before deciding to write - see mmap_region(), for example.

 - Store type (enum for store type?)
	Extending the "Tracking tree status on walks down", the
	information can then be used to determine what operation is
	needed during a mas_prealloct(), etc.  The operation can then be
	stored in the maple state to continue on a mas_store_prealloc().
	Obviously, this would work out well if we have mas_store_gfp()
	using the same functionality as mentioned above.

 - Full count/flag & Dense nodes
 	There is a bit that exists in the pointer reserved to indicate
	there is no NULLs in the leaves below.  This feature is mainly
	for trees that are non-alloc trees.  We can then know there is
	at least one singleton that can be stored below this entry.
	This is coupled with restoring Dense nodes as a potential node
	type.  The tricky part is deciding on when to switch to dense
	nodes/back from dense nodes (all entries to dense nodes must be
	singletons!).  See mte_set_full(), mte_clear_full(),
	mte_has_null() which use MAPLE_ENODE_NULL.

 - Fork & Dup tree + Delete DONT_COPY
 	This is to optimize dup_mmap() in kernel/fork.c, but other
	users may want faster duplications of a tree.
	This should be faster than building a tree in bulk mode.  The
	idea is to just copy each node and replace the pointers in,
	probably, a BFS order.  Once the leaves are reached, the VMAs
	will be replaced by the copies made in fork, unless DONT_COPY is
	set, in which case the VMA will be deleted from the copied tree.
	DONT_COPY is not common and since the tree isn't visible, all
	nodes should be available for reuse (no RCU worries).

 - Push reuse parent
 	During an insert, new nodes can be "pushed" left or right -
	see mas_push_data().  On a left push, it may be possible to
	reuse the node by making the write like an 'append'.  This may
	also be possible to be done during a split operation when the
	write extends the last slot. This needs examining to ensure RCU
	safety.

Larger Features:
 - 64 bit stores on 32 bit arch
 	A number of users want to use the maple tree, but cannot because
	they need a storage medium that supports 64 bit stores on 32 bit
	arch.

 - wr_mas with alloc instead of mas
 	Internally, we have a write maple state, but the maple state
	currently holds the allocations.  I'm not sure if this is worth
	it, and it will probably need "Tracking tree status on walks
	down" so that preallocations/allocations can be accurate.

 - Big Dense Node Type
 	There are various node types that could be added to the maple
	tree.  One of the most useful is probably a 'big dense' with the
	node being a 4k block of singletons.  This would have to come
	after the dense node type as dense enables more users and helps
	scope this work.

 - Judy Array
	The Judy Array is another fancy data structure.  Mathieu
	Desnoyers has enquired and spoken to us about incorporating judy
	arrays as a node type or other features he has in the judy
	array.  This is in the early thought stages.

Please note that any new features will need test cases added to ensure
they function correctly.

[1] https://lists.infradead.org/mailman/listinfo/maple-tree
[2] http://lists.infradead.org/pipermail/maple-tree/2023-June/002599.html




More information about the maple-tree mailing list