[PATCH] ARM: fix cpu_relax() in case of doing dmb

Jon Medhurst (Tixy) tixy at linaro.org
Fri Aug 24 05:14:12 EDT 2012


On Fri, 2012-08-24 at 09:15 +0800, Shawn Guo wrote:
> On Thu, Aug 23, 2012 at 07:31:26PM +0100, Jon Medhurst (Tixy) wrote:
> > On Thu, 2012-08-23 at 21:58 +0800, Shawn Guo wrote:
> > > On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote:
> > > > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
> > > > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> > > > > index 99afa74..7cc67ce 100644
> > > > > --- a/arch/arm/include/asm/processor.h
> > > > > +++ b/arch/arm/include/asm/processor.h
> > > > > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
> > > > >  unsigned long get_wchan(struct task_struct *p);
> > > > >  
> > > > >  #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> > > > > -#define cpu_relax()			smp_mb()
> > > > > +#define cpu_relax()		do {					\
> > > > > +					asm("nop");			\
> > > > > +					asm("nop");			\
> > > > > +					asm("nop");			\
> > > > > +					asm("nop");			\
> > > > > +					asm("nop");			\
> > > > 
> > > > Can you use nop() instead of the explicit asm?
> > > 
> > > Yes.  I just tried, and it works too.
> > > 
> > > > Also, I think we should try
> > > > and use some methodology on deciding the number of nops to insert. Without
> > > > having a full handle on the problem at the moment, it would seem that we
> > > > need at least NR_CPUS worth (since the number of spinning secondaries is
> > > > NR_CPUS-1 and they may execute their barriers in lock-step).
> > > > 
> > > I'm not sure we get something like that.  In my testing here, I need
> > > at least 5 nops to get rid of the issue.
> > 
> > Doesn't A9 do dual issue?
> 
> Do you have some details about the issue to share?

I don't have any particular insight, I was just making the observation
that if CPU clock cycles executed in the loop were a consideration, then
the fact that the A9 would probably execute two nops in a clock cycle
would be pertinent.

> > If so, the maths for your 4 core iMX6Q might
> > match up with Will's hypothesis. You could try the theory by building
> > say with CONFIG_NR_CPUS == 3.
> > 
> I'm still not quite sure about the hypothesis, but I assume you are
> asking if 3 NOPs will fix the issue.  If so, the answer is NO.
> I increase the number of NOP incrementally starting from 1, and the
> issue remains until we have 5 NOPs in there.

Right, your and Hui test seems to scupper the idea that number of nops
should be related to number of CPUs. (Which I didn't really under
either.)

Perhaps the issue is related to the size of the loop buffer,
or possibly cache line boundaries?

-- 
Tixy 






More information about the linux-arm-kernel mailing list