[PATCH v3 2/2] arm64: use WFE for long delays
Will Deacon
will.deacon at arm.com
Wed Oct 11 08:13:35 PDT 2017
Hi Julien,
On Fri, Sep 29, 2017 at 11:52:30AM +0100, Julien Thierry wrote:
> The current delay implementation uses the yield instruction, which is a
> hint that it is beneficial to schedule another thread. As this is a hint,
> it may be implemented as a NOP, causing all delays to be busy loops. This
> is the case for many existing CPUs.
>
> Taking advantage of the generic timer sending periodic events to all
> cores, we can use WFE during delays to reduce power consumption. This is
> beneficial only for delays longer than the period of the timer event
> stream.
>
> If timer event stream is not enabled, delays will behave as yield/busy
> loops.
>
> Signed-off-by: Julien Thierry <julien.thierry at arm.com>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Will Deacon <will.deacon at arm.com>
> Cc: Mark Rutland <mark.rutland at arm.com>
> ---
> arch/arm64/lib/delay.c | 23 +++++++++++++++++++----
> include/clocksource/arm_arch_timer.h | 4 +++-
> 2 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c
> index dad4ec9..4dc27f3 100644
> --- a/arch/arm64/lib/delay.c
> +++ b/arch/arm64/lib/delay.c
> @@ -24,10 +24,28 @@
> #include <linux/module.h>
> #include <linux/timex.h>
>
> +#include <clocksource/arm_arch_timer.h>
> +
> +#define USECS_TO_CYCLES(TIME_USECS) \
> + xloops_to_cycles((TIME_USECS) * 0x10C7UL)
The macro parameter can be lower-case here.
> +static inline unsigned long xloops_to_cycles(unsigned long xloops)
> +{
> + return (xloops * loops_per_jiffy * HZ) >> 32;
> +}
> +
> void __delay(unsigned long cycles)
> {
> cycles_t start = get_cycles();
>
> + if (arch_timer_evtstrm_available()) {
Hmm, is this never called in a context where preemption is enabled?
Maybe arch_timer_evtstrm_available should be using raw_smp_processor_id()
under the hood.
> + const cycles_t timer_evt_period =
> + USECS_TO_CYCLES(ARCH_TIMER_EVT_STREAM_PERIOD_US);
> +
> + while ((get_cycles() - start + timer_evt_period) < cycles)
> + wfe();
> + }
> +
> while ((get_cycles() - start) < cycles)
> cpu_relax();
> }
> @@ -35,10 +53,7 @@ void __delay(unsigned long cycles)
>
> inline void __const_udelay(unsigned long xloops)
> {
> - unsigned long loops;
> -
> - loops = xloops * loops_per_jiffy * HZ;
> - __delay(loops >> 32);
> + __delay(xloops_to_cycles(xloops));
> }
> EXPORT_SYMBOL(__const_udelay);
>
> diff --git a/include/clocksource/arm_arch_timer.h b/include/clocksource/arm_arch_timer.h
> index 4e28283..349e595 100644
> --- a/include/clocksource/arm_arch_timer.h
> +++ b/include/clocksource/arm_arch_timer.h
> @@ -67,7 +67,9 @@ enum arch_timer_spi_nr {
> #define ARCH_TIMER_USR_VT_ACCESS_EN (1 << 8) /* virtual timer registers */
> #define ARCH_TIMER_USR_PT_ACCESS_EN (1 << 9) /* physical timer registers */
>
> -#define ARCH_TIMER_EVT_STREAM_FREQ 10000 /* 100us */
> +#define ARCH_TIMER_EVT_STREAM_PERIOD_US 100
> +#define ARCH_TIMER_EVT_STREAM_FREQ \
> + (USEC_PER_SEC / ARCH_TIMER_EVT_STREAM_PERIOD_US)
This needs an ack from Marc or Mark.
Will
More information about the linux-arm-kernel
mailing list