[RFC] Make SMP secondary CPU up more resilient to failure.
Andrei Warkentin
andreiw at motorola.com
Sat Dec 18 07:10:49 EST 2010
On Sat, Dec 18, 2010 at 6:01 AM, Russell King - ARM Linux
<linux at arm.linux.org.uk> wrote:
> On Sat, Dec 18, 2010 at 01:17:34AM -0600, Andrei Warkentin wrote:
>> I do want to bring to your attention something I found while making
>> the patch I mailed here. It's that
>> secondary_startup (along with MMU-enabling friends) are located in the
>> .init.text section and get nuked after
>> startup. Which means that the secondary startup right now can't be
>> done without moving this code out of .init.text.
>> Moving that stuff is kind of iffy too, because it's used immediately
>> in head.S. I didn't try very hard :) (and it didn't work, but that's
>> my fault). Now I have a good reason to try again :).
>
> I have a series of 10 patches which does that, posted previously to this
> mailing list on 4th October which addresses that.
>
> There's another series of 22 patches which clean up the SMP support which
> you also probably aren't aware of (Daniel Walker is though.) That now
> has the VFP and pen_release fixes I just posted added to it, and it's
> also going to make the failure-to-online message a little more verbose
> as to why the failure happened.
>
>> >> >> /* return happens from __cortex_a9_restore */
>> >> >> barrier();
>> >> >> writel(smp_processor_id(), EVP_CPU_RESET_VECTOR);
>> >> >
>> >> > And here, if the CPU is not _immediately_ stopped, it will return and
>> >> > restart the bring-up. If you are using hardware to reset the CPU (which
>> >> > it seems you are), you should ensure that this function never returns.
>> >> > You may also wish to add a pr_crit() here to indicate when this failure
>> >> > happens.
>> >> >
>> >>
>> >> It is immediately stopped. Turned off really. But this is where it
>> >> will resume on when it gets brought back up (__cortex_a9_restore takes
>> >> care of that)
>> >
>>
>> But this is a writel to memory. And CONFIG_ARM_DMA_MEM_BUFFERABLE is
>> defined, so there is a __wmb() after the write, and in our case
>> #define wmb() do { dsb(); outer_sync(); } while (0)
>>
>> The dsb will flush the write buffer, unless I misunderstand.
>
> writel() is defined as follows:
>
> #define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })
>
> Notice that the wmb barrier is performed before the actual write to device,
> not after it. That is to ensure that if this device write is telling
> something to start DMA, any previous writes to DMA coherent memory (eg,
> for ring descriptors) will be visible to the DMA agent.
>
...and in that last case there are no writels following the write of
SMP id to reset vector. Oops :/.
> There are no ordering guarantees between the device write vs accesses
> after the writel().
>
>> Keep in mind that this is not why we were seeing that failure. If it
>> failed here, secondary_boot spinning on this value wouldn't have
>> succeeded, and it wouldn't even try spinning with a timeout on the
>> cpu_online bit. And while I agree with you that the hotplug code code
>> be severely less byzantine (and help everybody all the other mach SMP
>> efforts in a generic fashion), it's not the problem either - it works
>> - boot_secondary returns success and cpu_up enters the timeout wait
>> for the online bit.
>
> Well, the code has issues as it currently stands, and I'd like to see
> some of those issues resolved first, and then re-checking where we stand
> with this.
>
>> I debugged through what was happening, and boot_secondary was
>> succeeding (this means that the secondary IS now about to jump to
>> secondary_start_kernel).
>
> I assume you are aware that your boot_secondary() _always_ succeeds, even
> if this times out?
>
> timeout = jiffies + HZ;
> while (time_before(jiffies, timeout)) {
> if (readl(EVP_CPU_RESET_VECTOR) != boot_vector)
> break;
> udelay(10);
> }
> ...
> spin_unlock(&boot_lock);
>
> return 0;
>
> It would be good if you fixed that and re-tested - what could be
> happening is exactly the same thing you're accusing the generic code
> of doing:
>
> CPU0 CPU1
> writes RESET_VECTOR
> prods CPU
> starts wait
> cpu starts slowly
> times out
> restores RESET_VECTOR
> writes RESET_VECTOR
> returns success
> starts wait for CPU online
> ...
> times out
> returns -EIO
> cpu comes online
>
> You'll never know if your reset vector has been corrupted in this way
> because you always return success.
Gah, that's terrible...how could I have missed this... I'll go put the
foot back in my mouth :).
>
>> But the wait to set the online bit timed out.
>> Not by much (judging from kernel messages), but timed out. Now, this
>> was on a IO/memory/compute overloaded system, this isn't supposed to
>> happen in real life, but it happened, and it's not helping our stress
>> tests ;-). Basically there is no safe guards to prevent something like
>> this from happening, hence the patch.
>
> We assume in __cpu_up() is that if boot_secondary() returns success, the
> CPU is well on its way to coming online - iow, platform_secondary_init()
> has been reached, and we have relatively little stuff to do before we see
> the CPU coming online.
>
> I can't see that your code allows that assumption to be made, as
> boot_secondary() always returns success.
>
> Maybe you need your wait-for-reset-vector to be longer, and on failure
> place the CPU back into reset to ensure that it doesn't corrupt the
> reset vector if it starts late?
>
Definitely. This would be exactly the right place to place any holding logic...
Thanks again,
A
More information about the linux-arm-kernel
mailing list