[PATCH v6 08/71] Maple Tree: Add new data structure

Mark Hemment markhemm at googlemail.com
Wed Feb 16 02:11:54 PST 2022


Only a few v minor comments.

On Tue, 15 Feb 2022 at 14:43, Liam Howlett <liam.howlett at oracle.com> wrote:
>
> From: "Liam R. Howlett" <Liam.Howlett at Oracle.com>
>
> The maple tree is an RCU-safe range based B-tree designed to use modern
> processor cache efficiently.  There are a number of places in the kernel
> that a non-overlapping range-based tree would be beneficial, especially
> one with a simple interface.  The first user that is covered in this
> patch set is the vm_area_struct, where three data structures are
> replaced by the maple tree: the augmented rbtree, the vma cache, and the
> linked list of VMAs in the mm_struct.  The long term goal is to reduce
> or remove the mmap_sem contention.
>
> The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
> nodes.  With the increased branching factor, it is significantly shorter than
> the rbtree so it has fewer cache misses.  The removal of the linked list
> between subsequent entries also reduces the cache misses and the need to pull
> in the previous and next VMA during many tree alterations.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy at infradead.org>
> Signed-off-by: Liam R. Howlett <Liam.Howlett at oracle.com>
> Tested-by: David Howells <dhowells at redhat.com>
> ---
>  Documentation/core-api/index.rst              |    1 +
>  Documentation/core-api/maple_tree.rst         |  218 +
>  MAINTAINERS                                   |   12 +
>  include/linux/maple_tree.h                    |  683 ++
>  include/trace/events/maple_tree.h             |  123 +
>  init/main.c                                   |    2 +
>  lib/Kconfig.debug                             |   16 +
>  lib/Makefile                                  |    3 +-
>  lib/maple_tree.c                              | 6967 +++++++++++++++++
>  tools/testing/radix-tree/.gitignore           |    2 +
>  tools/testing/radix-tree/Makefile             |   13 +-
>  tools/testing/radix-tree/generated/autoconf.h |    1 +
>  tools/testing/radix-tree/linux/maple_tree.h   |    7 +
>  tools/testing/radix-tree/maple.c              |   59 +
>  .../radix-tree/trace/events/maple_tree.h      |    3 +
>  15 files changed, 8107 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/core-api/maple_tree.rst
>  create mode 100644 include/linux/maple_tree.h
>  create mode 100644 include/trace/events/maple_tree.h
>  create mode 100644 lib/maple_tree.c
>  create mode 100644 tools/testing/radix-tree/linux/maple_tree.h
>  create mode 100644 tools/testing/radix-tree/maple.c
>  create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h

...
> +++ b/include/linux/maple_tree.h
...
> +/*
> + * The Maple Tree squeezes various bits in at various points which aren't
> + * necessarily obvious.  Usually, this is done by observing that pointers are
> + * N-byte aligned and thus the bottom log_2(N) bits are available for use.  We
> + * don't use the high bits of pointers to store additional information because
> + * we don't know what bits are unused on any given architecture.
> + *
> + * Nodes are 256 bytes in size and are also aligned to 256 bytes, giving us 8
> + * low bits for our own purposes.  Nodes are currently of 4 types:

Is it worth adding the compile-time check;
    static_assert(sizeof(struct maple_tree) == 256);
?
...

> +struct ma_wr_state {
> +       struct ma_state *mas;
> +       struct maple_node *node;
> +       enum maple_type type;
> +       unsigned long r_min;
> +       unsigned long r_max;
> +       unsigned char offset_end;
> +       unsigned char node_end;
> +       unsigned long *pivots;
> +       unsigned long end_piv;
> +       void __rcu **slots;
> +       void *entry;
> +       void *content;
> +};

Minor: Moving the member 'type' to below 'r_max', means it will remove
the need for padding (reducing size from 88 to 80 byes).  Does this
matter?  I doubt it.  Better/worse for h/w cache?  Suspect no
difference.
...

> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> @@ -827,6 +834,7 @@ config DEBUG_VM_RB
>
>           If unsure, say N.
>
> +
>  config DEBUG_VM_PGFLAGS
>         bool "Debug page-flags operations"
>         depends on DEBUG_VM

Stray blank line.
...

> +++ b/lib/maple_tree.c

> + * Note types:
> + *  0x??1 = Root
> + *  0x?00 = 16 bit nodes
> + *  0x010 = 32 bit nodes
> + *  0x110 = 64 bit nodes
> + *
> + * Slot size and alignment
> + *  0x??1 : Root
> + *  0x?00 : 16 bit values, type in 0-1, slot in 2-6
> + *  0x010 : 32 bit values, type in 0-2, slot in 3-6
> + *  0x110 : 64 bit values, type in 0-2, slit in 3-6
> + */

s/slit/slot/

> +
> +#define MAPLE_PARENT_ROOT              0x01
...

> +/*
> + * ma_meta_gap() - Get the largest gap location of a node from the metadat

s/metadat/metadata/
...

> +/*
> + * mast_rebalace_prev() - Rebalance against the previous node

s/mast_rebalace_prev/mast_rebalance_prev/


> + * @mast: The maple subtree state
> + * @old_l: The encoded maple node to the left (previous node)
> + */
> +static inline void mast_rebalance_prev(struct maple_subtree_state *mast,
> +                                      struct maple_enode *old_l)
...

> +static inline void *mas_state_walk(struct ma_state *mas)
> +{
> +       void *entry;
> +
> +       entry = mas_start(mas);
> +       if (mas_is_none(mas))
> +               return NULL;
> +
> +       if (mas_is_ptr(mas))
> +               return entry;
> +
> +       return mtree_range_walk(mas);
> +}

A blank line after this mas_state_walk() function would help the formatting.

> +/*
> + * mtree_lookup_walk() - Internal quick lookup that does not keep maple state up
> + * to date.
> + *
> + * @mas: The maple state.
> + *
> + * Note: Leaves mas in undesirable state.
> + * Return: The entry for @mas->index or %NULL on dead node.
> + */
> +static inline void *mtree_lookup_walk(struct ma_state *mas)
...

> +/**
> + * mt_prev() - get the previous value in the maple tree
> + * @mt: The maple tree
> + * @index: The start index
> + * @min: The minumum index to check

s/minumum/minimum/
...

> +/**
> + * mas_nomem() - Check if there was an error allocating and do the allocation
> + * if necessary If there are allocations, then free them.
> + * @mas: The maple state
> + * @gfp: The GFP_FALGS to use for allocations

s/GFP_FALGS/GFP_FLAGS/
...

> +/**
> + * mtree_insert_range() - Insert an entry at a give range if there is no value.
> + * @mt: The maple tree
> + * @first: The start of the range
> + * @last: The end of the range
> + * @entry: The entry to store
> + * @gfp: The FGP_FLAGS to use for allocations.

s/FGP_FLAGS/GFP_FLAGS/
...

Cheers,
Mark



More information about the maple-tree mailing list