SMP issues on ARM11 MPCore

Russell King - ARM Linux linux at arm.linux.org.uk
Fri Jan 1 15:58:50 EST 2010


On Thu, Dec 24, 2009 at 03:27:09PM +0000, mkl lin wrote:
> 
> hi, 
> 
> I'm using ARM11 MPCore with 2 CPU, Linux-2.6.31.1, SMP enabled,
> L1 enabled, L2 disabled

An ARM board - which one?

> Under SMP environment, I have observed following issues:
> 
> case 1
> Sometimes, console became extremely slow, print 1 character for 1-2 seconds
> RVDS say that both CPU are idling.  kernel seems find because messages
> response to inserting USB flash is quick and correct.

Define "console".  What is it?  Serial port?  What messages are slow -
output from user programs or the kernel?  (Your comment seems to imply
just user programs but please confirm.)

> case 2
> Sometimes,  the Linux console halt and canot accept any input.
> RVDS say that both CPU are idling.  kernel seems fine because messages
> response to inserting USB flash is quick and correct.

These two sounds like a problem with interrupts - userspace console IO
is interrupt driven, whereas kernel console IO is not.

> case 3
> Sometimes, the test stop with no reason or some fault like segmantation
> fault and return to console prompt or login prompt.

No idea without further information.  Try enabling the user debugging
options, and pass user_debug=31 to the kernel to get messages from the
kernel on userspace segfaults.

> case 4
> Sometimes,
> the test stop with no reason, but not returning to console prompt. The
> console can accept input, but no further response, nor prompt.

Have you tried enabling 'Magic Sysrq' and sending <break>t to list the
task state in the system.  '<break>' there is whatever you need to do
to cause your serial terminal program to send a break condition.  For
minicom, that defaults to ^A f

> [   57.090000] MYDRIVER_exit:
> [   57.110000] MYDRIVER_init:
> [   57.150000] MYDRIVER_exit:
> [   57.180000] MYDRIVER_init:
> [   57.210000] MYDRIVER_exit:
> [   57.240000] MYDRIVER_init:
> [   57.270000] MYDRIVER_exit:
> [   57.300000] MYDRIVER_init:
> [   57.320000] sh: unhandled page fault (11) at 0x000b7dfc, code 0x017
> [   57.320000] pgd = c78b4000
> [   57.330000] [000b7dfc] *pgd=038f4031, *pte=00000000, *ppte=00000000
> [   57.350000]
> [   57.360000] Pid: 350, comm:                   sh
> [   57.370000] CPU: 1    Not tainted  (2.6.31.1-cavm1 #53)
> [   57.390000] PC is at 0x40058d04
> [   57.400000] LR is at 0xb7df8
> [   57.400000] pc : [<40058d04>]    lr : [<000b7df8>]    psr: 60000010
> [   57.400000] sp : bec8b6b8  ip : 0001d020  fp : 00000000
> [   57.440000] r10: 00000000  r9 : bec8b728  r8 : 00000002
> [   57.460000] r7 : 0009c038  r6 : 0001d028  r5 : 4009fe40  r4 : 400a02f8
> [   57.470000] r3 : 00000049  r2 : 0009add8  r1 : 0009add8  r0 : 00000049
> [   57.490000] Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
> [   57.520000] Control: 00c5787d  Table: 078b400a  DAC: 00000015
> Segmentation fault

This seems to imply that 0xb7dfc is an invalid address for 'sh'.

> -------
> 
> Without DCache and CONFIG_LOCAL_TIMERS, I can repeat the above procedure
> for 216 seconds, then it halted as case 4. Case 1 also exists with this
> config.

You can't disable the data cache in SMP mode.

It could be something to do with write allocate caches - we don't support
these particularly well in the kernel, and I wouldn't be surprised if
you've found some problem there.

The fact that it only happens in SMP mode rather points at that, because
that's one of the few hardware configurations which does have write 
allocate caches.  To confirm this, we need someone who can run your
tests on a UP platform which has write allocate caches...



More information about the linux-arm-kernel mailing list