Random failures while loading iwmmxt compiled binaries
Enrico Scholz
enrico.scholz at sigma-chemnitz.de
Tue Jun 22 12:15:33 EDT 2010
Hi,
I have the problem that program loading segfaults sometimes or aborts
with
| Inconsistency detected by ld.so: ../elf/dl-sysdep.c: 465: _dl_important_hwcaps: Assertion `m == cnt' f
ailed!
There are existing similar reports in the glibc[1] or gentoo[2] bugtrackers
and it happens very seldom during normal usage.
I analyzed it partially and can reproduce it in <10 minutes on PXA270
and PXA320 platforms, but I am unable to find a solution yet. These
platforms were running with kernel 2.6.34 (PXA270) and 2.6.31 (PXA320).
Conclusions:
============
* it is not a compiler bug
* it is not a eglibc bug
* it can be only the kernel (iwmmxt state not restored properly between
task switching?) or a bug in the silicon
Steps to reproduce it:
======================
1. Build (e)glibc and busybox with -march=iwmmxt -mcpu=iwmmxt -mtune=iwmmxt;
I placed binaries at [3] which are based on OpenEmbedded's gcc 4.3.4 and
eglibc 2.12.
2. Places the binaries into a tmpfs (that's important for triggering bug
in a short time; NFS or MTD takes longer)
mount -t tmpfs -o size=8m none /tmp
mkdir /tmp/bin /tmp/lib
cp /bin/busybox /tmp/bin/
cp /lib/ld[.-]* /tmp/lib
cp /lib/lib[cm][.-]* /tmp/lib
ln -s busybox /tmp/bin/sh
ln -s busybox /tmp/bin/sed
3. Create a testscript like
cat << EOF > /tmp/x
#! /bin/sh
while test `echo abcdefghijkl |
sed 's!.$!!' | sed 's!.$!!' | sed 's!.$!!' | \
sed 's!.$!!' | sed 's!.$!!' | sed 's!.$!!' | \
sed 's!.$!!' | sed 's!^.!!' | sed 's!^.!!'` = cde; do
:
done
EOF
4. Go into /tmp chroot and execute script
chroot /tmp
sh /x
After some time, you will get either the assertion or a segfault.
Analysis
========
Segfault with 'user_debug=-1' line gives
[ 2280.392473] sed: unhandled page fault (11) at 0x40026200, code 0x017
[ 2280.392500] pgd = c7bf0000
[ 2280.395191] [40026200] *pgd=87b57031, *pte=00000000, *ppte=00000000
[ 2280.401529]
[ 2280.408893] Pid: 8831, comm: sed
[ 2280.418649] CPU: 0 Not tainted (2.6.31.13 #1)
[ 2280.424497] PC is at 0x40016a84
[ 2280.427610] LR is at 0x40014b0c
[ 2280.430724] pc : [<40016a84>] lr : [<40014b0c>] psr: 20000010
[ 2280.430733] sp : befff8f8 ip : befff910 fp : befff99c
[ 2280.460747] r10: 00000000 r9 : 6474e552 r8 : 00000004
[ 2280.466537] r7 : 00000000 r6 : 00000012 r5 : 00000201 r4 : 40026202
[ 2280.473452] r3 : befff8f8 r2 : 00000009 r1 : 40026200 r0 : 40026202
[ 2280.479937] Flags: nzCv IRQs on FIQs on Mode USER_32 ISA ARM Segment user
[ 2280.487354] Control: 0400397f Table: 87bf0018 DAC: 00000015
Register -> Variable mapping:
-----------------------------
r6,r7: 'masked' --> strange: it seems to be always 0x12
r8: 'm'
r5: 'n'
r2: 'masked' >> 'n & 0xff' (r3 destroyed by code calling strlen())
The PC is strlen() and was called from LR _dl_important_hwcaps():
00016a80 <strlen>:
16a80: e3c01003 bic r1, r0, #3 ; 0x3
16a84: e4912004 ldr r2, [r1], #4
000147ec <_dl_important_hwcaps>:
14aa4: e1961007 orrs r1, r6, r7 <<<< the two 32
bit words of 'masked'
14aa8: 0a000023 beq 14b3c <_dl_important_hwcaps+0x350>
14aac: e51b4068 ldr r4, [fp, #-104]
14ab0: e51b206c ldr r2, [fp, #-108]
14ab4: e3a00001 mov r0, #1 ; 0x1
14ab8: e3a01000 mov r1, #0 ; 0x0
14abc: e084e002 add lr, r4, r2
14ac0: e28e4050 add r4, lr, #80 ; 0x50
14ac4: e3a05000 mov r5, #0 ; 0x0 <<<< this is 'n'
14ac8: ec41000a tmcrr wr10, r0, r1
14acc: ea000003 b 14ae0 <_dl_important_hwcaps+0x2f4>
14ad0: e1961007 orrs r1, r6, r7 <<<< start of loop
14ad4: e284400a add r4, r4, #10 ; 0xa
14ad8: 0a000017 beq 14b3c <_dl_important_hwcaps+0x350>
14adc: e2855001 add r5, r5, #1 ; 0x1
14ae0: ec476004 tmcrr wr4, r6, r7 <<<< move 'masked' into cp
14ae4: ee085110 tmcr wcgr0, r5
14ae8: eee40148 wsrldg wr0, wr4, wcgr0 <<<< right shift
of 'masked' for n & 0xff bits
14aec: ec532000 mra r2, r3, acc0 <<< this is tricky...
acc0 is wr0
14af0: e2020001 and r0, r2, #1 ; 0x1
14af4: e3500000 cmp r0, #0 ; 0x0 <<<< that's the 'if (...'
14af8: 0afffff4 beq 14ad0 <_dl_important_hwcaps+0x2e4>
14afc: e51b3044 ldr r3, [fp, #-68]
14b00: e1a00004 mov r0, r4
14b04: e7834188 str r4, [r3, r8, lsl #3]
>> 14b08: eb0007dc bl 16a80 <strlen>
14b0c: e51b1044 ldr r1, [fp, #-68]
14b10: ee095110 tmcr wcgr1, r5 <<<< move 'n' into cp
14b14: eeda5149 wslldg wr5, wr10, wcgr1 <<< left shift of 1 for n & 0xff bits
14b18: e081c188 add ip, r1, r8, lsl #3
14b1c: e58c0004 str r0, [ip, #4]
14b20: ec510005 tmrrc r0, r1, wr5 <<<< split 64 bit word
14b24: e0266000 eor r6, r6, r0 <<<< and do the ^=
14b28: e0277001 eor r7, r7, r1
14b2c: e1961007 orrs r1, r6, r7
14b30: e2888001 add r8, r8, #1 <<<< this is 'm'
14b34: e284400a add r4, r4, #10
14b38: 1affffe7 bne 14adc <_dl_important_hwcaps+0x2f0>
The corresponding code is
------------
uint64_t masked = GLRO(dl_hwcap) & GLRO(dl_hwcap_mask);
for (n = 0; masked != 0; ++n)
if ((masked & (1ULL << n)) != 0)
{
temp[m].str = _dl_hwcap_string (n);
temp[m].len = strlen (temp[m].str);
masked ^= 1ULL << n;
++m;
}
------------
The crash happens because 'n' (stored in r5) becomes 0x201 and an array
outside of the mapped memory is accessed.
The value of 0x201 means that loop has missed its end condition twice
(first time for 0x01 and second time for 0x101).
The assertion happens probably when loop exits in the second round
(0x101).
Generated assembly code looks sane to me and I do not see how 'n' can
become >64 (or I looked to long at it and missed the obvious). Within
the loop, only strlen() is called and this alters r0-r3 only.
As I wrote above, 'masked' seems to be always '0x12' which means that
always the same bits have not been cleared.
Does somebody has other ideas or a solution?
Enrico
Footnotes:
[1] http://sourceware.org/bugzilla/show_bug.cgi?id=6729
[2] http://bugs.gentoo.org/show_bug.cgi?id=194973
[3] https://www.cvg.de/people/ensc/iwmmxt-ld.tar.gz
More information about the linux-arm
mailing list