traversing vma on nommu
Hajime Tazaki
thehajime at gmail.com
Tue Nov 5 15:48:28 PST 2024
Hello Liam,
Thanks for your time to give me useful information, even though my
insufficient report for the situation; apologize my laziness.
On Wed, 06 Nov 2024 00:04:25 +0900,
Liam R. Howlett wrote:
>
> * Hajime Tazaki <thehajime at gmail.com> [241104 21:24]:
> >
> > Hello,
> >
> > I'd like to ask your help to debug my out-of-tree kernel extension,
> > which is still in an RFC stage. The RFC is about nommu extension to
> > UML (user momd linux).
> >
> > https://lore.kernel.org/linux-um/m2y129hpx5.wl-thehajime@gmail.com/T/#t
> >
> > I've been using v6.10 tag for the base branch for a while, mostly
> > works fine but after I rebased my branch to v6.12-rc2, I faced a crash
> > upon a process exit during vma iteration.
>
> Does this happen in later v6.12-rc releases as well?
>
> There are a few issues fixed in later releases which may solve your
> problem.
I pulled/rebased v6.12-rc6 and the issue still persists.
Thread 1 "vmlinux" received signal SIGSEGV, Segmentation fault.
acct_collect (exitcode=exitcode at entry=256, group_dead=group_dead at entry=1) at ../kernel/acct.c:565
565 vsize += vma->vm_end - vma->vm_start;
(gdb) p vma
$4 = (struct vm_area_struct *) 0xe
(gdb) p vmi
$5 = {
mas = {
tree = 0x703e8708,
index = 1886371840,
last = 1886371839,
node = 0x70bef30c,
min = 0,
max = 1886371839,
alloc = 0x0,
status = ma_active,
depth = 1 '\001',
offset = 15 '\017',
mas_flags = 0 '\000',
end = 14 '\016',
store_type = wr_invalid
}
}
(gdb) bt
#0 acct_collect (exitcode=exitcode at entry=256, group_dead=group_dead at entry=1)
at ../kernel/acct.c:565
#1 0x000000006003de9d in do_exit (code=code at entry=256) at ../kernel/exit.c:918
#2 0x000000006003e842 in do_group_exit (exit_code=256) at ../kernel/exit.c:1088
#3 0x000000006003e85c in __do_sys_exit_group (error_code=<optimized out>) at ../kernel/exit.c:1099
#4 __se_sys_exit_group (error_code=<optimized out>) at ../kernel/exit.c:1097
#5 0x0000000060038996 in do_syscall_64 (regs=0x705aadd0) at ../arch/x86/um/do_syscall_64.c:83
#6 0x0000000060038b03 in __kernel_vsyscall () at ../arch/x86/um/entry_64.S:73
#7 0x00010102464c457f in ?? ()
#8 0x0000000000000000 in ?? ()
> There was a patch for a numa scheduler issue with iterations discovered
> with stree-ng - which I introduced in my conversion to the maple tree in
> kernel/sched/fair.c [1]. I'm pretty sure this has nothing to do with
> what you are seeing, but worth a mention.
thanks.
> Does it happen without your patches? It is not clear form your
> statement above - but maybe you can't check considering it's nommu and
> you are trying to get um linux going.
I should try this before sending my previous email..
I'm preparing a test environment with other nommu arch with buildroot,
will let you know how it'll be.
> > I bisected this issue and found that if I reverted the 4 commits
> > below, the issue is gone.
> >
> > ed4dfd9aa1b1 maple_tree: make write helper functions void
> > c27e6183c654 maple_tree: remove unneeded mas_wr_walk() in mas_store_prealloc()
> > add60ea5f6d8 maple_tree: remove repeated sanity checks from write helper functions
> > 9155e8433498 maple_tree: remove node allocations from various write helper functions
>
> None of these patches will change the way any of this works - they are
> all on the write path and you are reading the vmas.
thanks for the information.
indeed, before this iteration, the data structure (vma_iterator->ma_state)
is somehow broken:
from above gdb output, when this issue happened, the ma_state->end
value seems to be smaller than ->offsest value.
(gdb) p vmi
$5 = {
mas = {
tree = 0x703e8708,
index = 1886371840,
last = 1886371839,
node = 0x70bef30c,
min = 0,
max = 1886371839,
alloc = 0x0,
status = ma_active,
depth = 1 '\001',
offset = 15 '\017',
mas_flags = 0 '\000',
end = 14 '\016',
store_type = wr_invalid
}
}
when other programs passed this path without any crash, it seems to be
always end > offset.
> It seems like this might be some sort of a race. How many times were
> you able to recreate it with/without those changes?
with my uml patches, some specific process (in the gdb session case,
`apk add pkgname`) will always trigger this issue. without patch,
I'll try to test it with another arch.
> It is also on task teardown, which uses a special path when no one has
> the mm struct reference and so we can be sure there are no readers
> within the tree.
I see.
> > I'd like to debug what's wrong with my code but no luck so far.
> > I thought it is related with nommu code (mm/nommu.c) but didn't find
> > any useful hints for me.
> >
> > It'd be very great if you have similar experience on this kind of
> > issue (tree iteration over vma, etc), or share some common pitfall
> > when using maple tree library.
>
>
> >
> > below is the log of a gdb session.
> >
> > ```
> > ${HOSTNAME%%.*}:$PWD# apk add se
> > afetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/x86_64/APKINDEX.tar.gz
> > fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/x86_64/APKINDEX.tar.gz
> > ERROR: unable to select packages:
> > se (no such package):
> > required by: world[se]
>
> Not entirely sure on alpine linux, is the relevant to the issue?
currently nommu UML only works on alpine with libc/busybox rebuild.
and not all processes trigger this issue but the apk command always
shows the issue on process exit.
> > Thread 1 "vmlinux" received signal SIGSEGV, Segmentation fault.
>
> is the bug that you got a sigsegv or is that the test (ie, are you
> sending a segv to make sure task teardown is okay?)
I guess this sigsegv is generated because the variable vma has invalid
address.
(gdb) p vma
$4 = (struct vm_area_struct *) 0xe
> > acct_collect (exitcode=exitcode at entry=256, group_dead=1) at ../kernel/acct.c:565
> > 565 vsize += vma->vm_end - vma->vm_start;
> > (gdb) l
> > 560 VMA_ITERATOR(vmi, mm, 0);
> > 561 struct vm_area_struct *vma;
> > 562
> > 563 mmap_read_lock(mm);
> > 564 for_each_vma(vmi, vma)
> > 565 vsize += vma->vm_end - vma->vm_start;
> > 566 mmap_read_unlock(mm);
> > 567 }
>
> This is fine. You have the mm locked (from current->mm) so the vma tree
> should not be modified.
thanks, this is the part of kernel/acct.c, which I didn't touch.
> > 568
> > 569 spin_lock_irq(¤t->sighand->siglock);
> > (gdb) bt
> > #0 acct_collect (exitcode=exitcode at entry=256, group_dead=1) at ../kernel/acct.c:565
> > #1 0x000000006003d5af in do_exit (code=code at entry=256) at ../kernel/exit.c:918
>
> do_exit uses the current->mm after here as well, so the current->mm
> seems safe. We'd know if the referen
I see.
> > #2 0x000000006003deca in do_group_exit (exit_code=256) at ../kernel/exit.c:1088
> > #3 0x000000006003dee4 in __do_sys_exit_group (error_code=<optimized out>) at ../kernel/exit.c:1099
> > #4 __se_sys_exit_group (error_code=<optimized out>) at ../kernel/exit.c:1097
> > #5 0x000000006003840d in do_syscall_64 (regs=0x705aadd8) at ../arch/x86/um/do_syscall_64.c:83
> > #6 0x000000006003855e in __kernel_vsyscall () at ../arch/x86/um/entry_64.S:73
> > #7 0x00000087000081ed in ?? ()
> > #8 0x671cad19671cad19 in ?? ()
> > #9 0x00000000671ca871 in ?? ()
> > #10 0x0000000800010000 in ?? ()
> > #11 0x0000000400080000 in ?? ()
> > #12 0x000000040001f30a in ?? ()
> > #13 0x0000000000000000 in ?? ()
> > ```
>
> I suspect that something went wrong in nommu in regards to
> adding/removing/splitting vmas, but you only discover the issue much
> later.
indeed.
> There is a fix in v6.12-rc4 bea07fd63192b ("maple_tree: correct tree
> corruption on spanning store") which you may need. Can you please try a
> newer version of the kernel?
>
> [1] https://lore.kernel.org/all/173012156393.1442.1751639070858226239.tip-bot2@tip-bot2/
thanks for the info.
As I noted above, the newer rc6 still cause the same issue.
I'll keep investigating what's going on and get back here once I have
some news.
again, thank you so much for your help.
-- Hajime
More information about the maple-tree
mailing list