[PROJECT] Is combining allocation and lookup a good idea?

Tue Nov 14 09:57:19 PST 2023

The rbtree combined allocation and lookup.  We followed that lead with
the maple tree and have "allocating nodes" which contain 10 slots. 10
gaps and 9 pivots.  Now I'm wondering if we made the right decision.

If we split "find a free range of address space" into a separate data
structure, we get 16 slots and 15 pivots.  Assuming that a tree node
is 70% full, an allocating tree can store 7 * 7 * 11 = 539 VMAs in a
height-3 tree.  If we have a tree of regular nodes, we can store 1331
in a height-3 tree -- over twice as many!

We should definitely be optimising for lookup, not allocation, so keeping
the allocation information in the tree alongside the lookup information
is probably a really bad idea.  I have lots of bad ideas.

We'd obviously want to keep the two data structures under the same lock,
but they needn't both be a maple tree.  There are definitely downsides
to the current scheme; for example , if you mmap() all of memory, then
munmap() alternate 64MB chunks, mmap() specifying a size of 64M and
alignment of 64M will not find any of the open spaces because we look
for a 128MB gap so that we can guarantee alignment.

I suspect the obvious first place to start is a second maple tree
which contains (start, length) descriptions of the gaps, but honestly
designing the right data structure for this could be a project in itself.
Just separating the allocation from the lookup is a big enough project,
I think.