[PATCH] maple_tree: Fix use of node for global range in mas_wr_spanning_store()

Liam Howlett liam.howlett at oracle.com
Fri Jul 8 18:09:22 PDT 2022


* Hugh Dickins <hughd at google.com> [220708 00:37]:
> On Wed, 6 Jul 2022, Liam Howlett wrote:
> > * Andrew Morton <akpm at linux-foundation.org> [220705 22:55]:
> > > On Wed, 6 Jul 2022 02:05:37 +0000 Liam Howlett <liam.howlett at oracle.com> wrote:
> > > 
> > > > When writing a range with a NULL which expands to 0 - ULONG_MAX, don't
> > > > use a node to store this value.  Instead, call mas_new_root() which will
> > > > set the tree pointer to NULL and free all the nodes.
> > > > 
> > > > Fix a comment for the allocations in mas_wr_spanning_store().
> > > > 
> > > > Add mas_node_count_gfp() and use it to clean up mas_preallocate().
> > > > 
> > > > Clean up mas_preallocate() and ensure the ma_state is safe on return.
> > > > 
> > > > Update maple_tree.h to set alloc = NULL.
> > > 
> > > Cool.
> > > 
> > > How are we looking now?  Any known issues still being worked on?
> > 
> > Did you pick up "Subject: [PATCH] mm/mmap: Fix copy_vma() new_vma
> > check"?  I sent that yesterday as well.
> > 
> > I think we are in good shape.  There were two outstanding issues I had
> > and this patch plus the copy_vma() patch fixes both.
> 

First, thank you again for looking at this patch set.  Your analysis of
my changes always end up educating me and making the code better.


> I'm not so sure.

I agree, I have a few more issues now.

> 
> I haven't gone back to check, but I do think there was a very recent
> lib/maple_tree.c change or set of changes which greatly improved it
> (say an hour to reach a problem instead of a minute).
> 
> But errors I do see still (I've only had time for it this month, and
> there have been three other non-maple-tree bugs in linux-next which
> got in the way of my huge tmpfs kernel build swapping load testing).
> 
> I see one kind of problem on one machine, and another on another,
> and have no idea why the difference, so cannot advise you on how
> to reproduce either.  I do have CONFIG_DEBUG_VM_MAPLE_TREE=y and
> CONFIG_DEBUG_MAPLE_TREE=y on both; but might try removing those,
> since they tell me nothing without a long education, and more
> important for me to write this mail now than get into capturing
> those numbers for you.
> 
> One machine does show such output, with BUG at mas_validate_gaps,
> doing __vma_adjust from __split_vma from do_mas_align_munmap.
> That's fairly typical of all reports from that machine, I think.

I woke up to a log of this, or something similar on my server - I've not
seen it on my laptop either.  There have been some connection issues
here so I've not been able to get the details I need to reproduce or
narrow this down.

> 
> Other machine (laptop I'm writing from) has never shown any such MT
> debug output, but hits the kernel BUG at include/linux/swapops.h:378!
> pfn_swap_entry_to_page() BUG_ON(is_migration_entry() && !PageLocked().
> 
> I've often called that the best BUG in the kernel: it tells whether
> rmap is holding together: migration entries were inserted on one
> rmap walk, some may have been zapped by unmapping meanwhile, but
> before midnight strikes and the page is unlocked, all the remaining
> ones have to be found by the second rmap walk.  (And if we removed the
> BUG, then it would be mysterious rare segfaults and/or data leaks:
> which I cannot call "stable").
> 
> Hitting that BUG suggests that the rmap locking is deficient somewhere
> (might be something else, but that's a best guess), and I wouldn't be
> surprised if that somewhere is __vma_adjust() - which is not quite
> the same as it was before (I wonder if the changes were essential for
> maple tree, or "simplifications" which seemed a good idea at the time).
> If there's a way to minimally maple-ize how it was before, known working,
> that would be a useful comparison to make.

I have not seen this bug.  I have re-evaluated my changes to
__vma_adjust() and do not see any lock moving as you discovered in
vma_expand().  The changes to __vma_adjust() were to remove the linked
list or inlined functions that were reduced to a line or two.  That, and
preallocating to avoid the fs-reclaim lockdep issue.

> 
> Here's a patch I've applied, that makes no difference to the problems
> I see, but fixes or highlights a little that worried me in the source.
> Earlier, you had that anon_vma locking in vma_expand() under "else",
> it's good to see that fixed, but I believe there's a case it misses;
> and I don't think you can change the lock ordering just because it looks
> nicer (see comments at head of mm/rmap.c for lock ordering, or mm/mremap.c
> for an example of what order we take file versus anon rmap locks).
> 

I will fix the issue you pointed out in your code. I'll also have a look
at any other areas I may have messed up the locking -  as soon as read
that block at the start mm/rmap.c a few more dozen times.


> Hopefully maple tree stable is not far off, but seems not quite yet.

Al Viro also told me he saw a significant regression to xfs test 250, a
slow down of a factor of 2.  I started to look into this as well.  I
have not seen this slow down in my testing, but that must be flushed out
as well.  Perhaps my access to larger machines will help in the
reproduction of this issue as well.

So there are still a few issues to contend with before this is 'stable'.
Any analysis of the patches you are able to do would probably expedite
the stability path.

Thanks again,
Liam



More information about the maple-tree mailing list