[PATCH v2 1/2] ARM: shmobile: r8a7740: Add Suspend-To-RAM A3SM

Bastian Hecht hechtb at gmail.com
Mon Apr 15 07:33:31 EDT 2013


Hello Lorenzo,

thanks for the review.

2013/4/12 Lorenzo Pieralisi <lorenzo.pieralisi at arm.com>:
> On Thu, Apr 11, 2013 at 03:07:43PM +0100, Bastian Hecht wrote:
>
> [...]
>
>> diff --git a/arch/arm/mach-shmobile/sleep-r8a7740.S b/arch/arm/mach-shmobile/sleep-r8a7740.S
>> new file mode 100644
>> index 0000000..762f978
>> --- /dev/null
>> +++ b/arch/arm/mach-shmobile/sleep-r8a7740.S
>> @@ -0,0 +1,72 @@
>> +/*
>> + * Low level sleep code for the SoC r8a7740
>> + *
>> + * Copyright (C) 2013 Bastian Hecht
>> + *
>> + * This file is subject to the terms and conditions of the GNU General Public
>> + * License.  See the file "COPYING" in the main directory of this archive
>> + * for more details.
>> + */
>> +
>> +#include <linux/linkage.h>
>> +#include <linux/init.h>
>> +#include <asm/memory.h>
>> +
>> +#ifdef CONFIG_SUSPEND
>> +
>> +/* r8a7740_shutdown expects L2 to be flushed */
>> +     .text
>> +ENTRY(r8a7740_shutdown)
>> +     stmfd   sp!, {r4-r12, lr}
>> +
>> +     /* make sure the stack stays intact */
>> +     bl      v7_flush_dcache_all
>> +
>
> Why do not you move the cache flush above after clearing the C bit ?
> You must not push on the stack _after_ clearing the C bit and before
> cleaning the cache, that's the only requirement (well basically you
> should not write any data and do not rely on any dirty data sitting
> in the cache, since with the C bit cleared as I already said A9 L1
> D-cache is not searched).
>
> AS long as, with C bit cleared, you do NOT push on the stack before
> cleaning the cache, your cache flushing routine will clean data to DRAM,
> so you are fine. v7_flush_dcache_all does not use the stack.
>
>         mrc     p15, 0, r0, c1, c0, 0
>         bic     r0, r0, #(1 << 2)               @ Disable the C bit
>         mcr     p15, 0, r0, c1, c0, 0
>         isb
>         bl      v7_flush_dcache_all
>
> This code snippet will do and it is compliant with the SMP procedure,
> I do not want people to copy'n'paste code that does not work on SMP,
> I know this code runs on UP so the sequence above is safe, but why not make
> the sequence identical for both UP/SMP ?

Now this looks super straight forward. I wonder how I came up with
inverted order (even if works in the UP case). Probably when you look
at others code and they put a v7_flush_dcache_all before the disable,
one (meaning me...) starts to bend the underlying ideas about the
caches until it fits to the code you see.

Anyway: Of course I'll take the code sequence then that works for the
UP and SMP cases.

>> +     /*
>> +      * Clear the SCTLR.C bit to prevent further data cache
>> +      * allocation. Clearing SCTLR.C would make all the data accesses
>> +      * strongly ordered and would not hit the cache.
>> +      */
>> +     mrc     p15, 0, r0, c1, c0, 0
>> +     bic     r0, r0, #(1 << 2)               @ Disable the C bit
>> +     mcr     p15, 0, r0, c1, c0, 0
>> +     isb
>> +
>> +     /*
>> +      * We don't issue another v7_flush_dcache_all here as seen in many
>> +      * other places as we have a UP core and the L1 could not soak up
>> +      * data from other L1 caches in the meantime.
>> +      */
>> +
>> +     bl      cpu_v7_do_idle
>> +
>> +     /* in rare cases when WFI fails we end up here and restore things */
>
> I am still baffled by this "wfi failures" and how hardware can manage
> this situation _properly_, if you can explain to me how this works
> that would be grand.

I unfortunately don't know about the hardware design of this SoC.

> This code is also called by CPUidle right ?

I haven't experienced this when using suspend-to-ram but only for
CPUIdle. Of course this might be just a matter of probability though.

> So,
> IRQs can still cause wfi completion, there has to be a handshake between
> the core and power controller to prevent the CPU from being shutdown
> while executing code (power down command sent, standbywfi asserted and
> then released following wfi completion).
>
> Anyway, I do not like the code sequence (and I know that eg OMAP4 executes
> the same code) above because again, it works on UP, but people might be
> tempted to copy'n'paste it to their SMP implementation (with the addition
> of SMP bit restore in ACTLR).
>
> OMAP4 sequence is missing TLB flush since the core runs for a time window with
> incoherent TLBs (out of SMP). It works because other cores can't update
> their page tables while the core exiting wfi is running out of coherency
> (ie coupled idle C-states, either all CPUs down or no CPU down), but
> again, that code should not be copied verbatim since it fails to flush
> TLBs, AFAIK.
>
> Again, you are running a UP kernel so this code does not suffer from
> the same problem, but if this code is copied and used on SMP platforms then
> we do have problem and that has to be prevented.
>
> Why do not you turn MMU off (cpu_reset) and jump to the reset vector if
> wfi fails (code in arch/arm/kernel/process.c __soft_restart(), minus the
> cache flushing ?
>
>         phys_reset_t phys_reset;
>
>         /* Take out a flat memory mapping. */
>         setup_mm_for_reboot();
>
>         phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
>         phys_reset((unsigned long)addr);
>
>         BUG();
>
> where addr == virt_to_phys(cpu_resume);
>
> from what I gather from the code below.
>
> There is just a tiny performance hit owing to a couple of fetches from
> DRAM (MMU off), then cpu_resume restores the C bit and the MMU (and flushes
> the TLBs) so the virtual address space and caches are back on. By the time wfi
> fails, caches are invalid anyway so in both cases they have to be refilled.
>
> On top of that, this code is not really SH specific, it is plain v7 code
> that can be standardised.

Ok that sounds good. As we don't touch the hotpath I don't care much
about some extra cycles and I appreciate moving towards
standardization. I'll try that out and send a v3 with the two changes.

So thanks a lot,

 Bastian

> For now it is just a, hopefully useful, rant.
>
> Lorenzo
>
>> +     mrc     p15, 0, r0, c1, c0, 0
>> +     orr     r0, r0, #(1 << 2)               @ Enable the C bit
>> +     mcr     p15, 0, r0, c1, c0, 0
>> +     isb
>> +
>> +     ldmfd   sp!, {r4-r12, pc}
>> +ENDPROC(r8a7740)
>> +
>> +     .text
>> +ENTRY(v7_cpu_resume)
>> +     bl      v7_invalidate_l1
>> +     b       cpu_resume
>> +ENDPROC(v7_cpu_resume)
>> +
>> +/*
>> + * The entry point of a warm reboot, used by wakeup scenarios
>> + *
>> + * The CPU jumps in this case to (0xfffff000 & SBAR), so we need
>> + * to align this function properly.
>> + * We use a long jump into the text segment and use the physical
>> + * address as the MMU is still turned off.
>> + */
>> +     .align  12
>> +     .text
>> +ENTRY(r8a7740_resume)
>> +     ldr     pc, 1f
>> +1:   .long   v7_cpu_resume - PAGE_OFFSET + PLAT_PHYS_OFFSET
>> +ENDPROC(r8a7740_resume_core_standby)
>> +#endif
>> --
>



More information about the linux-arm-kernel mailing list