traversing vma on nommu
Liam R. Howlett
Liam.Howlett at oracle.com
Tue Nov 5 07:04:25 PST 2024
* Hajime Tazaki <thehajime at gmail.com> [241104 21:24]:
>
> Hello,
>
> I'd like to ask your help to debug my out-of-tree kernel extension,
> which is still in an RFC stage. The RFC is about nommu extension to
> UML (user momd linux).
>
> https://lore.kernel.org/linux-um/m2y129hpx5.wl-thehajime@gmail.com/T/#t
>
> I've been using v6.10 tag for the base branch for a while, mostly
> works fine but after I rebased my branch to v6.12-rc2, I faced a crash
> upon a process exit during vma iteration.
Does this happen in later v6.12-rc releases as well?
There are a few issues fixed in later releases which may solve your
problem.
There was a patch for a numa scheduler issue with iterations discovered
with stree-ng - which I introduced in my conversion to the maple tree in
kernel/sched/fair.c [1]. I'm pretty sure this has nothing to do with
what you are seeing, but worth a mention.
Does it happen without your patches? It is not clear form your
statement above - but maybe you can't check considering it's nommu and
you are trying to get um linux going.
>
> I bisected this issue and found that if I reverted the 4 commits
> below, the issue is gone.
>
> ed4dfd9aa1b1 maple_tree: make write helper functions void
> c27e6183c654 maple_tree: remove unneeded mas_wr_walk() in mas_store_prealloc()
> add60ea5f6d8 maple_tree: remove repeated sanity checks from write helper functions
> 9155e8433498 maple_tree: remove node allocations from various write helper functions
None of these patches will change the way any of this works - they are
all on the write path and you are reading the vmas.
It seems like this might be some sort of a race. How many times were
you able to recreate it with/without those changes?
It is also on task teardown, which uses a special path when no one has
the mm struct reference and so we can be sure there are no readers
within the tree.
>
> I'd like to debug what's wrong with my code but no luck so far.
> I thought it is related with nommu code (mm/nommu.c) but didn't find
> any useful hints for me.
>
> It'd be very great if you have similar experience on this kind of
> issue (tree iteration over vma, etc), or share some common pitfall
> when using maple tree library.
>
> below is the log of a gdb session.
>
> ```
> ${HOSTNAME%%.*}:$PWD# apk add se
> afetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/x86_64/APKINDEX.tar.gz
> fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/x86_64/APKINDEX.tar.gz
> ERROR: unable to select packages:
> se (no such package):
> required by: world[se]
Not entirely sure on alpine linux, is the relevant to the issue?
>
> Thread 1 "vmlinux" received signal SIGSEGV, Segmentation fault.
is the bug that you got a sigsegv or is that the test (ie, are you
sending a segv to make sure task teardown is okay?)
> acct_collect (exitcode=exitcode at entry=256, group_dead=1) at ../kernel/acct.c:565
> 565 vsize += vma->vm_end - vma->vm_start;
> (gdb) l
> 560 VMA_ITERATOR(vmi, mm, 0);
> 561 struct vm_area_struct *vma;
> 562
> 563 mmap_read_lock(mm);
> 564 for_each_vma(vmi, vma)
> 565 vsize += vma->vm_end - vma->vm_start;
> 566 mmap_read_unlock(mm);
> 567 }
This is fine. You have the mm locked (from current->mm) so the vma tree
should not be modified.
> 568
> 569 spin_lock_irq(¤t->sighand->siglock);
> (gdb) bt
> #0 acct_collect (exitcode=exitcode at entry=256, group_dead=1) at ../kernel/acct.c:565
> #1 0x000000006003d5af in do_exit (code=code at entry=256) at ../kernel/exit.c:918
do_exit uses the current->mm after here as well, so the current->mm
seems safe. We'd know if the referen
> #2 0x000000006003deca in do_group_exit (exit_code=256) at ../kernel/exit.c:1088
> #3 0x000000006003dee4 in __do_sys_exit_group (error_code=<optimized out>) at ../kernel/exit.c:1099
> #4 __se_sys_exit_group (error_code=<optimized out>) at ../kernel/exit.c:1097
> #5 0x000000006003840d in do_syscall_64 (regs=0x705aadd8) at ../arch/x86/um/do_syscall_64.c:83
> #6 0x000000006003855e in __kernel_vsyscall () at ../arch/x86/um/entry_64.S:73
> #7 0x00000087000081ed in ?? ()
> #8 0x671cad19671cad19 in ?? ()
> #9 0x00000000671ca871 in ?? ()
> #10 0x0000000800010000 in ?? ()
> #11 0x0000000400080000 in ?? ()
> #12 0x000000040001f30a in ?? ()
> #13 0x0000000000000000 in ?? ()
> ```
I suspect that something went wrong in nommu in regards to
adding/removing/splitting vmas, but you only discover the issue much
later.
There is a fix in v6.12-rc4 bea07fd63192b ("maple_tree: correct tree
corruption on spanning store") which you may need. Can you please try a
newer version of the kernel?
[1] https://lore.kernel.org/all/173012156393.1442.1751639070858226239.tip-bot2@tip-bot2/
Thanks,
Liam
More information about the maple-tree
mailing list