[PATCH] ARM: omap2+: Revert omap-smp.c changes resetting cpu1 during boot

Tony Lindgren tony at atomide.com
Thu Feb 16 08:54:09 PST 2017


* Andrew F. Davis <afd at ti.com> [170216 08:30]:
> On 02/16/2017 10:10 AM, Tony Lindgren wrote:
> > * Tony Lindgren <tony at atomide.com> [170215 14:28]:
> >> * Andrew F. Davis <afd at ti.com> [170215 14:14]:
> >>> On 02/15/2017 01:12 PM, Tony Lindgren wrote:
> >>>> And also the same issue happens doing kexec on beagle-x15 naturally if
> >>>> the cpu1 reset is removed.
> >>>>
> >>>
> >>> When a core actually powers up it idles in ROM code waiting for
> >>> OMAP_AUX_CORE_BOOT_0 to be set. When we shutdown a core it is not really
> >>> powered off, we just let it spin in omap4_cpu_die() or
> >>> omap4_secondary_startup() waiting on OMAP_AUX_CORE_BOOT_0, just like if
> >>> it were still trapped in ROM after a reset.
> > 
> > OK so I debugged this a bit more. We have CPU1 in omap_do_wfi()
> > as we don't currently have omap5_secondary_startup() or any deeper
> > idle mode support beyond retention for omap5 or dra7 in the mainline
> > kernel.
> > 
> >>> The issue with this fake startup idle loop is that, unlike the ROM based
> >>> startup idle loop, these do *not* jump to the address we stored in
> >>> OMAP_AUX_CORE_BOOT_1, they just make the assumption that they can safely
> >>> jump to the kernel startup function.
> > 
> > This does not seem to be the case here.
> > 
> 
> Well this is what I am seeing every time, this code only works when it
> is the same kernel we kexec, any changed addresses here will not work.

Hmm let's talk the mainline kernel here. Currently things do work in
the mainline kernel because of the cpu1 reset. And without cpu1 reset
things will currently go wrong in the mainline kernel both for kexec
and suspend/resume.

> >>> So when we tell this core to boot, and it is not in the real ROM startup
> >>> loop, it breaks stuff as it jumps to the old kernel's
> >>> secondary_startup() even though we gave it the correct address in
> >>> OMAP_AUX_CORE_BOOT_1.
> > 
> > And this is not happening. I think this is what I was seeing earlier,
> > but it's not the omap5/dra7 issue.
> > 
> > What we have is cpu1 returning from previous kernel's omap_do_wfi()
> > in the kexec booted kernel's code and that's when things go wrong.
> > 
> 
> We are the ones sending it to omap_do_wfi(), in omap4_cpu_die() it gets
> idled in a loop, it shouldn't be idled after it is shut off, it should
> get parked, we should do this like we do in omap5_secondary_startup().

Yup agreed. We need to figure out if it's just normal cpuidle hot-unplug
event vs shut down and park for kexec. Probably cpu_kill() is the place
to park it, need to check.

> > So if cpu1 was configured for idle for any reason, it will never gets
> > to omap5_secondary_startup without the reset currently.
> > 
> > The reason kexec and suspend/resume mostly works for omap4 without
> > cpu1 reset is that we usually enter off mode for cpu1 and the context
> > is lost and then we properly go through omap4_secondary_startup. Or
> > that's my theory so far for the occasional flakeyness I've been seeing :)
> > 
> > Any ideas what we should try to check to see if cpu1 is in idle
> > mode so we can do the reset if needed?
> > 
> 
> You can never reset the core, resetting the core is not allowed on HS
> devices and so it really doesn't matter what the core is doing. In no
> case is reseting the core a valid work-around for not correctly parking
> it. We need to fix the omap4_cpu_die() to not let the core go idle if
> the return from idle path is the problem.

Yeah well from Linux point of view, what we're interested in is that
cpu1 comes up reliably in all cases no matter what it takes. I agree
doing a reset on it should be only done if nothing else helps. And I
can see some HS implementations not allowing cpu1 reset. And I can see
some product specific bootloaders idle cpu1 and that's where things
break again.

For your use case, probably all we need is runtime checks for HS in
addition to parking cpu1 for kexec. If that's not enough, then maybe
a device specific DT property for never-reset-no-matter-what.

Regards,

Tony



More information about the linux-arm-kernel mailing list