[PROJECT] Is combining allocation and lookup a good idea?
Matthew Wilcox
willy at infradead.org
Wed Nov 15 06:47:20 PST 2023
On Tue, Nov 14, 2023 at 09:37:40PM -0500, Liam R. Howlett wrote:
> * Matthew Wilcox <willy at infradead.org> [231114 12:57]:
> > The rbtree combined allocation and lookup. We followed that lead with
> > the maple tree and have "allocating nodes" which contain 10 slots. 10
> > gaps and 9 pivots. Now I'm wondering if we made the right decision.
> >
> > If we split "find a free range of address space" into a separate data
> > structure, we get 16 slots and 15 pivots. Assuming that a tree node
> > is 70% full, an allocating tree can store 7 * 7 * 11 = 539 VMAs in a
> > height-3 tree.
>
> 4 height tree, right? root -> 7 -> 7 -> 11 (entries).
... however we call it ;-)
> > If we have a tree of regular nodes, we can store 1331
> > in a height-3 tree -- over twice as many!
>
> In practice, I don't think we see 70% usage on non-leaf nodes.
> Splitting occurs when the leaves are full/over-full, and the same goes
> for parents. So we end up with a potential of the leaves being 50%
> full, but the parents are significantly more dense - data wise. We also
> push left/right at each level.
OK, but this calculation would be similar for both using ar64 and r64
nodes. Let's say the leaves are 60% full and the parents are 90% full.
With an ar64 tree, we get 9 * 9 * 10 = 810. With a purely r64 tree,
we get 14 * 14 * 10 = 1960.
> Realistically, we see 3 high trees (not 4 like above), which have a root
> of 2-4 with close to full children.
The processes which have two elements at root might be able to shrink
the tree by one level. Assuming each leaf is 60% full, a 2-level ar64
tree can have 100 VMAs (10 leaves, each leaf with 10 VMAs) before adding
an extra layer. A 2-level r64 tree can have 16 leaves, each with 10
VMAs for a total of 160 VMAs. Yes, in the range between 160 and 810
there's no difference in the height of the tree.
> > We should definitely be optimising for lookup, not allocation, so keeping
> > the allocation information in the tree alongside the lookup information
> > is probably a really bad idea. I have lots of bad ideas.
>
> I'm not so sure. Our write time is longer than what we are replacing
> and our read time is shorter. I fear this will slow down the write time
> (which is currently our pain point) in hopes of reducing one read
> dereference. But we mostly see level 3 trees and I'm not even sure we'd
> see that many decreases in heigh in the normal VMA range?
I think you're over-reacting to the pain points we see. Nobody's filing
bugs saying "the performance of page faults improved AND I DON'T LIKE IT".
> > I suspect the obvious first place to start is a second maple tree
> > which contains (start, length) descriptions of the gaps, but honestly
> > designing the right data structure for this could be a project in itself.
> > Just separating the allocation from the lookup is a big enough project,
> > I think.
>
> Having 2 trees will double the nodes required to track the data, and
> allocations are also a rather painful point right now. We'd have to
> pre-allocate a lot more nodes here. We would also increase the cache
> usage by ~2x, or at least 50%?
Ooh, no. You'd effectively collapse all the mapped addresses into a
single entry.
$ cat /proc/self/maps
559b9e0ce000-559b9e0d0000 r--p 00000000 fe:01 805306718 /usr/bin/cat
559b9e0d0000-559b9e0d5000 r-xp 00002000 fe:01 805306718 /usr/bin/cat
559b9e0d5000-559b9e0d8000 r--p 00007000 fe:01 805306718 /usr/bin/cat
559b9e0d8000-559b9e0d9000 r--p 00009000 fe:01 805306718 /usr/bin/cat
559b9e0d9000-559b9e0da000 rw-p 0000a000 fe:01 805306718 /usr/bin/cat
559b9f943000-559b9f964000 rw-p 00000000 00:00 0 [heap]
7ffb60000000-7ffb60564000 r--p 00000000 fe:01 6352021 /usr/lib/locale/locale-archive
7ffb606bc000-7ffb606bf000 rw-p 00000000 00:00 0
7ffb606bf000-7ffb606e5000 r--p 00000000 fe:01 548883 /usr/lib/x86_64-linux-gnu/libc.so.6
7ffb606e5000-7ffb6083a000 r-xp 00026000 fe:01 548883 /usr/lib/x86_64-linux-gnu/libc.so.6
7ffb6083a000-7ffb6088e000 r--p 0017b000 fe:01 548883 /usr/lib/x86_64-linux-gnu/libc.so.6
7ffb6088e000-7ffb60892000 r--p 001cf000 fe:01 548883 /usr/lib/x86_64-linux-gnu/libc.so.6
7ffb60892000-7ffb60894000 rw-p 001d3000 fe:01 548883 /usr/lib/x86_64-linux-gnu/libc.so.6
7ffb60894000-7ffb608a1000 rw-p 00000000 00:00 0
7ffb608a7000-7ffb608cb000 rw-p 00000000 00:00 0
7ffb608cb000-7ffb608cc000 r--p 00000000 fe:01 235950 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffb608cc000-7ffb608f1000 r-xp 00001000 fe:01 235950 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffb608f1000-7ffb608fb000 r--p 00026000 fe:01 235950 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffb608fb000-7ffb608fd000 r--p 00030000 fe:01 235950 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffb608fd000-7ffb608ff000 rw-p 00032000 fe:01 235950 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffece153000-7ffece174000 rw-p 00000000 00:00 0 [stack]
7ffece1d8000-7ffece1dc000 r--p 00000000 00:00 0 [vvar]
7ffece1dc000-7ffece1de000 r-xp 00000000 00:00 0 [vdso]
Instead of having 23 entries, we'd have:
0-559b9e0ce000
559b9e0da000-559b9f943000
559b9f964000-7ffb60000000
7ffb60564000-7ffb606bc000
7ffb608a1000-7ffb608a7000
7ffb608ff000-7ffece153000
7ffece174000-7ffece1d8000
7ffece1de000-oo
just 8 entries.
> We still have the full bit, remember. We could utilize the bit when
> searching for a gap to avoid walking down full sub-trees. It might be
> better all around if we just use the non-alloc tree and the start
> location to find gaps and check them on iteration.
I'm hoping we get to a better algorithm for finding a gap than either
"over allocate" or "scan the entire thing". There's plenty of
literature on writing good memory allocators.
> What if we mix R64 and AR64 nodes in the same tree. Searching for
> gaps would either use the size on ar64 or the full bit on r64 nodes.
> We'd switch to R64 when the AR64 nodes are full. This would reduce the
> height when possible, but also allow for some support of size searching
> - we could fall back to walking the tree at lower levels by using the
> full bit to skip nodes that have no NULLs.
Seems more complex, but would have the same advantage in terms of
reducing height ...
More information about the maple-tree
mailing list