[PATCH v9 01/15] asm-generic: add barrier smp_cond_load_relaxed_timeout()
Ankur Arora
ankur.a.arora at oracle.com
Fri Nov 8 14:15:53 PST 2024
Christoph Lameter (Ampere) <cl at gentwo.org> writes:
> On Thu, 7 Nov 2024, Ankur Arora wrote:
>
>> > Calling the clock retrieval function repeatedly should be fine and is
>> > typically done in user space as well as in kernel space for functions that
>> > need to wait short time periods.
>>
>> The problem is that you might have multiple CPUs polling in idle
>> for prolonged periods of time. And, so you want to minimize
>> your power/thermal envelope.
>
> On ARM that maps to YIELD which does not do anything for the power
> envelope AFAICT. It switches to the other hyperthread.
Agreed. For arm64 patch-5 adds a specialized version.
For the fallback case when we don't have an event stream, the
arm64 version does use the same cpu_relax() loop but that's
not a production thing.
>> For instance see commit 4dc2375c1a4e "cpuidle: poll_state: Avoid
>> invoking local_clock() too often" which originally added a similar
>> rate limit to poll_idle() where they saw exactly that issue.
>
> Looping w/o calling local_clock may increase the wait period etc.
Yeah. I don't think that's a real problem for the poll_idle()
case as the only thing waiting on the other side of the possibly
delayed timer is a deeper idle state.
But, for any other potential users the looping duration might be
too long (the generated code for x86 will execute around 200 * 7
instructions before checking the timer, so a worst case delay of
say around 1-2us.)
I'll note that in the comment around smp_cond_time_check_count
just to warn any future users.
> For power saving most arches have special instructions like ARMS
> WFE/WFET. These are then causing more accurate wait times than the looping
> thing?
Definitely true for WFET. The WFE can still overshoot because the
eventstream has a period of 100us.
--
ankur
More information about the linux-arm-kernel
mailing list