s2idle breaks on machines without cpuidle support

Florian Fainelli florian.fainelli at broadcom.com
Thu Sep 7 12:11:22 PDT 2023


On 2/8/23 08:45, Hector Martin wrote:
> On 09/02/2023 01.18, Sudeep Holla wrote:
>> On Thu, Feb 09, 2023 at 12:42:17AM +0900, Hector Martin wrote:
>>> On 08/02/2023 19.35, Sudeep Holla wrote:
>>>> On Wed, Feb 08, 2023 at 04:48:18AM +0900, Kazuki wrote:
>>>>> On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
>>>>>>
>>>>>> What do you mean by break ? More details on the observation would be helpful.
>>>>> For example, CLOCK_MONOTONIC doesn't stop even after suspend since
>>>>> these chain of commands don't get called.
>>>>>
>>>>> call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)
>>>>>
>>>>> Which in turn causes programs like systemd to crash since it doesn't
>>>>> expect this.
>>>>
>>>> Yes expected IIUC. The per-cpu timers and counters continue to tick in
>>>> WFI and hence CLOCK_MONOTONIC can't stop.
>>>
>>> The hardware counters would also keep ticking in "real" s2idle (with
>>> hypothetical PSCI idle support) and often in full suspend. There is a
>>> flag for this (CLOCK_SOURCE_SUSPEND_NONSTOP) that is set by default for
>>> ARM. So this isn't why CLOCK_MONOTONIC isn't stopping; there is
>>> machinery to make the kernel's view of time stop anyway, it's just not
>>> being invoked here.
>>>
>>
>> Indeed, and I assumed s2idle was designed with those requirements but I
>> think I may be wrong especially looking at few points you have raised
>> provide my understanding is aligned with yours.
>>
>>> This is somewhat orthogonal to the issue of PSCI. These machines can't
>>> physically support PSCI and KVM at the same time (they do not have EL3),
>>> so PSCI is not an option. We will be starting a conversation about how
>>> to provide something "like" PSCI over some other sort of transport to
>>> solve this soon. So that will "fix" this issue once it's all implemented.
>>>
>>
>> All the best for the efforts.
>>
>>> However, these machines aren't the only ones without PSCI (search for
>>> "spin-table" in arch/arm64/boot/dts, this isn't new and these aren't the
>>> first).
>>
>> Yes I am aware of it and if you see arch/arm64/kernel/smp_spin_table.c
>> we don't support CPU hotplug or suspend for such a system.
> 
> We certainly support s2idle, except it's kind of broken as stated. Try
> it, it works :-)
> 
> I didn't do anything special to enable s2idle on Apple platforms other
> than make sure random drivers weren't broken and there was at least one
> driver capable of triggering a wakeup. I just compile with
> CONFIG_SUSPEND and s2idle works. Except for the part where
> CLOCK_MONOTONIC keeps running. So generic kernels on spin_table
> platforms ought to expose (broken) s2idle by default already.
> 
>>> It seems broken that Linux currently implements s2idle in such a
>>> way that it violates the userspace clock behavior contract on systems
>>> without a cpuidle driver (and does so silently, to make it worse).
>>
>> Just to check if I understand this correctly, are you referring to:
>> cpuidle_idle_call()->default_idle_call() if cpuidle_not_available()
>> And the problem is it idles there in wfi but CLOCK_MONOTONIC isn't
>> stopping as expected by the userspace ? If so, fair enough. If not, I
>> may be missing to understand something.
> 
> Right. I'm not too certain on the details of exactly what suspend
> machinery is running/supposed to, because this CLOCK_MONOTONIC issue was
> a surprise to me when it came up. From my point of view s2idle "just
> worked", it's only now that this has come up that we're realizing it's
> winding up in a very different codepath to what would happen were
> cpuidle/PSCI available. This was all silent from the user POV (it all
> looks like it suspends/resumes normally as far as I can tell).
> 
>>> So that should be fixed regardless of whether we end up coming up with a
>>> PSCI alternative or not for these platforms.
>>
>> If above understanding is correct, I agree. But not sure what was the
>> motivation behind the current behaviour.
>>
>>> There's no fundamental reason why s2idle can't work properly with plain WFI.
>>>
>>
>> Fair enough. I hadn't thought much of it before as most of the platforms
>> I have seen or used have at-least one deeper than WFI state these days.
>> On arm32, this was common but each platform managed suspend_set_ops
>> on its own and probably can do the same s2idle_set_ops.
> 
> Yeah, we do have one deeper idle state (and we should figure out how to
> implement a PSCI alternative to enable it soon, since in particular for
> certain SoCs plain WFI is quite a power hog since it keeps all the core
> clusters powered up and at least partially clocked). But since we don't
> have that yet, we've been using WFI-only s2idle so users have *some*
> suspend ability.

Same here. I would caution against assuming that a cpuidle driver is 
always available. There are tons of systems out there that only support 
WFI as their idle state and in that case there is no backing cpuidle 
driver either, that is cpuidle_not_available() returns false. There were 
also report on x86 where PSCI is not available:

https://github.com/systemd/systemd/issues/9538
https://bugzilla.kernel.org/show_bug.cgi?id=200595

Having recently been affected in the same way by this bug, it would be 
nice if we could do something about it. I tried to do the following:

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 3f8c7867c14c..d021f6ceb352 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -90,9 +90,11 @@ void __cpuidle default_idle_call(void)
         if (current_clr_polling_and_test()) {
                 local_irq_enable();
         } else {
+               RCU_NONIDLE(tick_freeze());
                 stop_critical_timings();
                 arch_cpu_idle();
                 start_critical_timings();
+               RCU_NONIDLE(tick_unfreeze());
         }
  }

but this is not quite working and I am getting tons of warnings from 
ktime_get_seconds() early on boot.

Is there a point at which it is safe to call tick_freeze() and 
tick_unfreeze()? It is not obvious to me how cpuidle goes about not 
hitting those same warnings, short of being initialized later.

Thanks!
-- 
Florian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4221 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20230907/64803b9b/attachment-0001.p7s>


More information about the linux-arm-kernel mailing list