Query: ARM64: A random failure with hugetlbfs linked mmap() of a stack area

Mark Rutland mark.rutland at arm.com
Mon Mar 27 05:18:06 PDT 2017


On Sat, Mar 25, 2017 at 05:44:58PM +0530, Pratyush Anand wrote:
> On Friday 24 March 2017 11:46 PM, Mark Rutland wrote:
> >>>For your report, it's not clear to me what's going on. Did you take the
> >>>/proc/pid/maps data from teh exact same process that the segfault
> >>>occurred in? and/or did you disable ASLR?
> >>Yes, it is from the same process.
> >That is troubling; I cannot explain that.
> 
> Can you pl try in an infinite loop for some time and see if
> "SIGSEGV" is received in any of the run at your end.

After several thousand runs, I see a few unhandled translations faults
(all for address 0) in dmesg. 

I suspect that in this case, the hugepage has clobbered some
datastructure used shortly after the return from the syscall, and we end
up dereferencing a pointer that's been replaced with zeroes.

> >>Since, I was not able to reproduce with gdb so, I had inserted a
> >>scanf() just before mmap() and then had read /proc/pid/maps.
> >That might be because GDB disables ASLR by default. Did you re-enable
> >ASLR within GDB with:
> >
> >	set disable-randomization off
> >
> >If not, could you give that a go?
> 
> Yes, with ASLR enabled, it reproduced in GDB as well. I do not see
> SIGILL, it is SIGSEGV there too.

So far, I have not managed to trigger a single SIGSEGV while running
under GDB.

However, I have a theory that could explain that. I suspect that my
toolchain has built the binary with an executable stack, while yours has
not. Linux automatically sets READ_IMPLIES_EXEC for binaries with
executable stacks, which IIUC would implicitly make the mmap RWX rather
than RW.

So in my case, the huge page is executable, and I get a SIGILL when
trying to execute from it. In your case, the huge page is not
executable, so you get a SIGSEGV.

Looking at your report below:

> Mapped address spaces:
> 
>           Start Addr           End Addr       Size     Offset objfile
>             0x400000           0x410000    0x10000        0x0
> /home/panand/work/hugetlb/hugetlb_test_stack
>             0x410000           0x420000    0x10000        0x0
> /home/panand/work/hugetlb/hugetlb_test_stack
>             0x420000           0x430000    0x10000    0x10000
> /home/panand/work/hugetlb/hugetlb_test_stack

All the entries from here ...

>       0xffffada70000     0xffffadbd0000   0x160000        0x0
> /usr/lib64/libc-2.17.so
>       0xffffadbd0000     0xffffadbe0000    0x10000   0x150000
> /usr/lib64/libc-2.17.so
>       0xffffadbe0000     0xffffadbf0000    0x10000   0x160000
> /usr/lib64/libc-2.17.so
>       0xffffadc10000     0xffffadc20000    0x10000        0x0 [vvar]
>       0xffffadc20000     0xffffadc30000    0x10000        0x0 [vdso]
>       0xffffadc30000     0xffffadc50000    0x20000        0x0
> /usr/lib64/ld-2.17.so
>       0xffffadc50000     0xffffadc60000    0x10000    0x10000
> /usr/lib64/ld-2.17.so
>       0xffffadc60000     0xffffadc70000    0x10000    0x20000

... to here ...

> /usr/lib64/ld-2.17.so
>       0xffffcb1d0000     0xffffcb200000    0x30000        0x0 [stack]
> (gdb) c
> Continuing.
> hpage_size is 20000000
> file path is /mnt/hugetlbfs/test
> stack_address is 0xffffcb1facc0
> Address to be mapped is 0xffffa0000000

... are clobbered by this map, which will cover the range:
	
	0xffffa0000000-0xFFFFC0000000

> Program received signal SIGSEGV, Segmentation fault.
> 0x0000ffffadb45a44 in __mmap (addr=<optimized out>, len=536870912,
> prot=3, flags=17, fd=7, offset=0)

That address falls within libc-2.17.so, which is clobbered by the mmap.

Do you happen to know how to parse that 'prot=3' in the SEGV report? I'm
guessing that means RW, !X.

Thanks,
Mark.



More information about the linux-arm-kernel mailing list