Do you have IWMMXT support enabled in your kernel config?<br><br>Rex<br><br><div class="gmail_quote">On Tue, Jun 22, 2010 at 9:15 AM, Enrico Scholz <span dir="ltr"><<a href="mailto:enrico.scholz@sigma-chemnitz.de">enrico.scholz@sigma-chemnitz.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Hi,<br>
<br>
I have the problem that program loading segfaults sometimes or aborts<br>
with<br>
<br>
| Inconsistency detected by ld.so: ../elf/dl-sysdep.c: 465: _dl_important_hwcaps: Assertion `m == cnt' f<br>
ailed!<br>
<br>
There are existing similar reports in the glibc[1] or gentoo[2] bugtrackers<br>
and it happens very seldom during normal usage.<br>
<br>
I analyzed it partially and can reproduce it in <10 minutes on PXA270<br>
and PXA320 platforms, but I am unable to find a solution yet. These<br>
platforms were running with kernel 2.6.34 (PXA270) and 2.6.31 (PXA320).<br>
<br>
<br>
Conclusions:<br>
============<br>
<br>
* it is not a compiler bug<br>
<br>
* it is not a eglibc bug<br>
<br>
* it can be only the kernel (iwmmxt state not restored properly between<br>
task switching?) or a bug in the silicon<br>
<br>
<br>
Steps to reproduce it:<br>
======================<br>
<br>
1. Build (e)glibc and busybox with -march=iwmmxt -mcpu=iwmmxt -mtune=iwmmxt;<br>
I placed binaries at [3] which are based on OpenEmbedded's gcc 4.3.4 and<br>
eglibc 2.12.<br>
<br>
2. Places the binaries into a tmpfs (that's important for triggering bug<br>
in a short time; NFS or MTD takes longer)<br>
<br>
mount -t tmpfs -o size=8m none /tmp<br>
mkdir /tmp/bin /tmp/lib<br>
cp /bin/busybox /tmp/bin/<br>
cp /lib/ld[.-]* /tmp/lib<br>
cp /lib/lib[cm][.-]* /tmp/lib<br>
ln -s busybox /tmp/bin/sh<br>
ln -s busybox /tmp/bin/sed<br>
<br>
3. Create a testscript like<br>
<br>
cat << EOF > /tmp/x<br>
#! /bin/sh<br>
<br>
while test `echo abcdefghijkl |<br>
sed 's!.$!!' | sed 's!.$!!' | sed 's!.$!!' | \<br>
sed 's!.$!!' | sed 's!.$!!' | sed 's!.$!!' | \<br>
sed 's!.$!!' | sed 's!^.!!' | sed 's!^.!!'` = cde; do<br>
:<br>
done<br>
EOF<br>
<br>
4. Go into /tmp chroot and execute script<br>
<br>
chroot /tmp<br>
sh /x<br>
<br>
<br>
After some time, you will get either the assertion or a segfault.<br>
<br>
<br>
<br>
Analysis<br>
========<br>
<br>
Segfault with 'user_debug=-1' line gives<br>
<br>
[ 2280.392473] sed: unhandled page fault (11) at 0x40026200, code 0x017<br>
[ 2280.392500] pgd = c7bf0000<br>
[ 2280.395191] [40026200] *pgd=87b57031, *pte=00000000, *ppte=00000000<br>
[ 2280.401529]<br>
[ 2280.408893] Pid: 8831, comm: sed<br>
[ 2280.418649] CPU: 0 Not tainted (2.6.31.13 #1)<br>
[ 2280.424497] PC is at 0x40016a84<br>
[ 2280.427610] LR is at 0x40014b0c<br>
[ 2280.430724] pc : [<40016a84>] lr : [<40014b0c>] psr: 20000010<br>
[ 2280.430733] sp : befff8f8 ip : befff910 fp : befff99c<br>
[ 2280.460747] r10: 00000000 r9 : 6474e552 r8 : 00000004<br>
[ 2280.466537] r7 : 00000000 r6 : 00000012 r5 : 00000201 r4 : 40026202<br>
[ 2280.473452] r3 : befff8f8 r2 : 00000009 r1 : 40026200 r0 : 40026202<br>
[ 2280.479937] Flags: nzCv IRQs on FIQs on Mode USER_32 ISA ARM Segment user<br>
[ 2280.487354] Control: 0400397f Table: 87bf0018 DAC: 00000015<br>
<br>
Register -> Variable mapping:<br>
-----------------------------<br>
<br>
r6,r7: 'masked' --> strange: it seems to be always 0x12<br>
r8: 'm'<br>
r5: 'n'<br>
r2: 'masked' >> 'n & 0xff' (r3 destroyed by code calling strlen())<br>
<br>
<br>
The PC is strlen() and was called from LR _dl_important_hwcaps():<br>
<br>
00016a80 <strlen>:<br>
16a80: e3c01003 bic r1, r0, #3 ; 0x3<br>
16a84: e4912004 ldr r2, [r1], #4<br>
<br>
000147ec <_dl_important_hwcaps>:<br>
14aa4: e1961007 orrs r1, r6, r7 <<<< the two 32<br>
bit words of 'masked'<br>
14aa8: 0a000023 beq 14b3c <_dl_important_hwcaps+0x350><br>
14aac: e51b4068 ldr r4, [fp, #-104]<br>
14ab0: e51b206c ldr r2, [fp, #-108]<br>
14ab4: e3a00001 mov r0, #1 ; 0x1<br>
14ab8: e3a01000 mov r1, #0 ; 0x0<br>
14abc: e084e002 add lr, r4, r2<br>
14ac0: e28e4050 add r4, lr, #80 ; 0x50<br>
14ac4: e3a05000 mov r5, #0 ; 0x0 <<<< this is 'n'<br>
14ac8: ec41000a tmcrr wr10, r0, r1<br>
14acc: ea000003 b 14ae0 <_dl_important_hwcaps+0x2f4><br>
14ad0: e1961007 orrs r1, r6, r7 <<<< start of loop<br>
14ad4: e284400a add r4, r4, #10 ; 0xa<br>
14ad8: 0a000017 beq 14b3c <_dl_important_hwcaps+0x350><br>
14adc: e2855001 add r5, r5, #1 ; 0x1<br>
14ae0: ec476004 tmcrr wr4, r6, r7 <<<< move 'masked' into cp<br>
14ae4: ee085110 tmcr wcgr0, r5<br>
14ae8: eee40148 wsrldg wr0, wr4, wcgr0 <<<< right shift<br>
of 'masked' for n & 0xff bits<br>
14aec: ec532000 mra r2, r3, acc0 <<< this is tricky...<br>
acc0 is wr0<br>
14af0: e2020001 and r0, r2, #1 ; 0x1<br>
14af4: e3500000 cmp r0, #0 ; 0x0 <<<< that's the 'if (...'<br>
14af8: 0afffff4 beq 14ad0 <_dl_important_hwcaps+0x2e4><br>
14afc: e51b3044 ldr r3, [fp, #-68]<br>
14b00: e1a00004 mov r0, r4<br>
14b04: e7834188 str r4, [r3, r8, lsl #3]<br>
>> 14b08: eb0007dc bl 16a80 <strlen><br>
14b0c: e51b1044 ldr r1, [fp, #-68]<br>
14b10: ee095110 tmcr wcgr1, r5 <<<< move 'n' into cp<br>
14b14: eeda5149 wslldg wr5, wr10, wcgr1 <<< left shift of 1 for n & 0xff bits<br>
14b18: e081c188 add ip, r1, r8, lsl #3<br>
14b1c: e58c0004 str r0, [ip, #4]<br>
14b20: ec510005 tmrrc r0, r1, wr5 <<<< split 64 bit word<br>
14b24: e0266000 eor r6, r6, r0 <<<< and do the ^=<br>
14b28: e0277001 eor r7, r7, r1<br>
14b2c: e1961007 orrs r1, r6, r7<br>
14b30: e2888001 add r8, r8, #1 <<<< this is 'm'<br>
14b34: e284400a add r4, r4, #10<br>
14b38: 1affffe7 bne 14adc <_dl_important_hwcaps+0x2f0><br>
<br>
The corresponding code is<br>
<br>
------------<br>
uint64_t masked = GLRO(dl_hwcap) & GLRO(dl_hwcap_mask);<br>
<br>
for (n = 0; masked != 0; ++n)<br>
if ((masked & (1ULL << n)) != 0)<br>
{<br>
temp[m].str = _dl_hwcap_string (n);<br>
temp[m].len = strlen (temp[m].str);<br>
masked ^= 1ULL << n;<br>
++m;<br>
}<br>
------------<br>
<br>
<br>
The crash happens because 'n' (stored in r5) becomes 0x201 and an array<br>
outside of the mapped memory is accessed.<br>
<br>
The value of 0x201 means that loop has missed its end condition twice<br>
(first time for 0x01 and second time for 0x101).<br>
<br>
The assertion happens probably when loop exits in the second round<br>
(0x101).<br>
<br>
<br>
Generated assembly code looks sane to me and I do not see how 'n' can<br>
become >64 (or I looked to long at it and missed the obvious). Within<br>
the loop, only strlen() is called and this alters r0-r3 only.<br>
<br>
As I wrote above, 'masked' seems to be always '0x12' which means that<br>
always the same bits have not been cleared.<br>
<br>
<br>
Does somebody has other ideas or a solution?<br>
<br>
<br>
<br>
Enrico<br>
<br>
Footnotes:<br>
[1] <a href="http://sourceware.org/bugzilla/show_bug.cgi?id=6729" target="_blank">http://sourceware.org/bugzilla/show_bug.cgi?id=6729</a><br>
[2] <a href="http://bugs.gentoo.org/show_bug.cgi?id=194973" target="_blank">http://bugs.gentoo.org/show_bug.cgi?id=194973</a><br>
[3] <a href="https://www.cvg.de/people/ensc/iwmmxt-ld.tar.gz" target="_blank">https://www.cvg.de/people/ensc/iwmmxt-ld.tar.gz</a><br>
<br>
<br>
_______________________________________________<br>
linux-arm mailing list<br>
<a href="mailto:linux-arm@lists.infradead.org">linux-arm@lists.infradead.org</a><br>
<a href="http://lists.infradead.org/mailman/listinfo/linux-arm" target="_blank">http://lists.infradead.org/mailman/listinfo/linux-arm</a><br>
</blockquote></div><br>