[PATCH v6 08/71] Maple Tree: Add new data structure

Liam Howlett liam.howlett at oracle.com
Wed Feb 16 10:25:38 PST 2022


* Mark Hemment <markhemm at googlemail.com> [220216 05:12]:
> Only a few v minor comments.
> 
> On Tue, 15 Feb 2022 at 14:43, Liam Howlett <liam.howlett at oracle.com> wrote:
> >
> > From: "Liam R. Howlett" <Liam.Howlett at Oracle.com>
> >
> > The maple tree is an RCU-safe range based B-tree designed to use modern
> > processor cache efficiently.  There are a number of places in the kernel
> > that a non-overlapping range-based tree would be beneficial, especially
> > one with a simple interface.  The first user that is covered in this
> > patch set is the vm_area_struct, where three data structures are
> > replaced by the maple tree: the augmented rbtree, the vma cache, and the
> > linked list of VMAs in the mm_struct.  The long term goal is to reduce
> > or remove the mmap_sem contention.
> >
> > The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
> > nodes.  With the increased branching factor, it is significantly shorter than
> > the rbtree so it has fewer cache misses.  The removal of the linked list
> > between subsequent entries also reduces the cache misses and the need to pull
> > in the previous and next VMA during many tree alterations.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy at infradead.org>
> > Signed-off-by: Liam R. Howlett <Liam.Howlett at oracle.com>
> > Tested-by: David Howells <dhowells at redhat.com>
> > ---
> >  Documentation/core-api/index.rst              |    1 +
> >  Documentation/core-api/maple_tree.rst         |  218 +
> >  MAINTAINERS                                   |   12 +
> >  include/linux/maple_tree.h                    |  683 ++
> >  include/trace/events/maple_tree.h             |  123 +
> >  init/main.c                                   |    2 +
> >  lib/Kconfig.debug                             |   16 +
> >  lib/Makefile                                  |    3 +-
> >  lib/maple_tree.c                              | 6967 +++++++++++++++++
> >  tools/testing/radix-tree/.gitignore           |    2 +
> >  tools/testing/radix-tree/Makefile             |   13 +-
> >  tools/testing/radix-tree/generated/autoconf.h |    1 +
> >  tools/testing/radix-tree/linux/maple_tree.h   |    7 +
> >  tools/testing/radix-tree/maple.c              |   59 +
> >  .../radix-tree/trace/events/maple_tree.h      |    3 +
> >  15 files changed, 8107 insertions(+), 3 deletions(-)
> >  create mode 100644 Documentation/core-api/maple_tree.rst
> >  create mode 100644 include/linux/maple_tree.h
> >  create mode 100644 include/trace/events/maple_tree.h
> >  create mode 100644 lib/maple_tree.c
> >  create mode 100644 tools/testing/radix-tree/linux/maple_tree.h
> >  create mode 100644 tools/testing/radix-tree/maple.c
> >  create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h
> 
> ...
> > +++ b/include/linux/maple_tree.h
> ...
> > +/*
> > + * The Maple Tree squeezes various bits in at various points which aren't
> > + * necessarily obvious.  Usually, this is done by observing that pointers are
> > + * N-byte aligned and thus the bottom log_2(N) bits are available for use.  We
> > + * don't use the high bits of pointers to store additional information because
> > + * we don't know what bits are unused on any given architecture.
> > + *
> > + * Nodes are 256 bytes in size and are also aligned to 256 bytes, giving us 8
> > + * low bits for our own purposes.  Nodes are currently of 4 types:
> 
> Is it worth adding the compile-time check;
>     static_assert(sizeof(struct maple_tree) == 256);
> ?

sizeof(struct maple_node), but yes.  That sounds reasonable.

> ...
> 
> > +struct ma_wr_state {
> > +       struct ma_state *mas;
> > +       struct maple_node *node;
> > +       enum maple_type type;
> > +       unsigned long r_min;
> > +       unsigned long r_max;
> > +       unsigned char offset_end;
> > +       unsigned char node_end;
> > +       unsigned long *pivots;
> > +       unsigned long end_piv;
> > +       void __rcu **slots;
> > +       void *entry;
> > +       void *content;
> > +};
> 
> Minor: Moving the member 'type' to below 'r_max', means it will remove
> the need for padding (reducing size from 88 to 80 byes).  Does this
> matter?  I doubt it.  Better/worse for h/w cache?  Suspect no
> difference.

It wouldn't hurt.  I'll make this change.

> ...
> 
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > @@ -827,6 +834,7 @@ config DEBUG_VM_RB
> >
> >           If unsure, say N.
> >
> > +
> >  config DEBUG_VM_PGFLAGS
> >         bool "Debug page-flags operations"
> >         depends on DEBUG_VM
> 
> Stray blank line.
> ...

Thanks

> 
> > +++ b/lib/maple_tree.c
> 
> > + * Note types:
> > + *  0x??1 = Root
> > + *  0x?00 = 16 bit nodes
> > + *  0x010 = 32 bit nodes
> > + *  0x110 = 64 bit nodes
> > + *
> > + * Slot size and alignment
> > + *  0x??1 : Root
> > + *  0x?00 : 16 bit values, type in 0-1, slot in 2-6
> > + *  0x010 : 32 bit values, type in 0-2, slot in 3-6
> > + *  0x110 : 64 bit values, type in 0-2, slit in 3-6
> > + */
> 
> s/slit/slot/

Thanks

> 
> > +
> > +#define MAPLE_PARENT_ROOT              0x01
> ...
> 
> > +/*
> > + * ma_meta_gap() - Get the largest gap location of a node from the metadat
> 
> s/metadat/metadata/

Thanks

> ...
> 
> > +/*
> > + * mast_rebalace_prev() - Rebalance against the previous node
> 
> s/mast_rebalace_prev/mast_rebalance_prev/

Thanks.

> 
> 
> > + * @mast: The maple subtree state
> > + * @old_l: The encoded maple node to the left (previous node)
> > + */
> > +static inline void mast_rebalance_prev(struct maple_subtree_state *mast,
> > +                                      struct maple_enode *old_l)
> ...
> 
> > +static inline void *mas_state_walk(struct ma_state *mas)
> > +{
> > +       void *entry;
> > +
> > +       entry = mas_start(mas);
> > +       if (mas_is_none(mas))
> > +               return NULL;
> > +
> > +       if (mas_is_ptr(mas))
> > +               return entry;
> > +
> > +       return mtree_range_walk(mas);
> > +}
> 
> A blank line after this mas_state_walk() function would help the formatting.

Thanks

> 
> > +/*
> > + * mtree_lookup_walk() - Internal quick lookup that does not keep maple state up
> > + * to date.
> > + *
> > + * @mas: The maple state.
> > + *
> > + * Note: Leaves mas in undesirable state.
> > + * Return: The entry for @mas->index or %NULL on dead node.
> > + */
> > +static inline void *mtree_lookup_walk(struct ma_state *mas)
> ...
> 
> > +/**
> > + * mt_prev() - get the previous value in the maple tree
> > + * @mt: The maple tree
> > + * @index: The start index
> > + * @min: The minumum index to check
> 
> s/minumum/minimum/

Thanks

> ...
> 
> > +/**
> > + * mas_nomem() - Check if there was an error allocating and do the allocation
> > + * if necessary If there are allocations, then free them.
> > + * @mas: The maple state
> > + * @gfp: The GFP_FALGS to use for allocations
> 
> s/GFP_FALGS/GFP_FLAGS/

Thanks

> ...
> 
> > +/**
> > + * mtree_insert_range() - Insert an entry at a give range if there is no value.
> > + * @mt: The maple tree
> > + * @first: The start of the range
> > + * @last: The end of the range
> > + * @entry: The entry to store
> > + * @gfp: The FGP_FLAGS to use for allocations.
> 
> s/FGP_FLAGS/GFP_FLAGS/

One day I will type this correctly the first time.  Not today it seems.

> ...


More information about the maple-tree mailing list