[PATCH v10 16/20] coresight: Add PM callbacks for sink device
James Clark
james.clark at linaro.org
Fri Apr 10 01:55:47 PDT 2026
On 09/04/2026 4:44 pm, Leo Yan wrote:
> On Thu, Apr 09, 2026 at 11:52:00AM +0100, James Clark wrote:
>
> [...]
>
>>> @@ -1787,15 +1808,32 @@ static int coresight_pm_save(struct coresight_path *path)
>>> to = list_prev_entry(coresight_path_last_node(path), link);
>>> coresight_disable_path_from_to(path, from, to);
>>> + ret = coresight_pm_device_save(coresight_get_sink(path));
>>> + if (ret)
>>> + goto sink_failed;
>>> +
>>
>> The comment directly above this says "Up to the node before sink to avoid
>> latency". But then this line goes and saves the sink anyway. So I'm not sure
>> what's meant by the comment?
>
> I will refine the comment to: "Control the path up to the node before
> the _system_ sink to avoid latency caused by memory copying; afterwards,
> saves context if the sink provides a callback".
>
>
But list_prev_entry(coresight_path_last_node(path)) doesn't do any thing
special for system sinks, it always returns from source up to the sink
regardless of the type. Shouldn't it be more like:
/* Disable up to the node before sink */
to = list_prev_entry(coresight_path_last_node(path), link);
coresight_disable_path_from_to(path, from, to);
/*
* Save sink. Most sinks don't implement a save callback to avoid
* latency caused by memory copying.
*/
ret = coresight_pm_device_save(coresight_get_sink(path));
>>> return 0;
>>> +
>>> +sink_failed:
>>> + if (!coresight_enable_path_from_to(path, coresight_get_mode(source),
>>> + from, to))
>>> + coresight_pm_device_restore(source);
>>> +
>>> + pr_err("Failed in coresight PM save on CPU%d: %d\n",
>>> + smp_processor_id(), ret);
>>> + this_cpu_write(percpu_pm_failed, true);
>>
>> Why does only a failing sink set percpu_pm_failed when failing to save the
>> source exits early.
>
> My purpose is to use "percpu_pm_failed == true" to prevent further
> issues if we already know the link has failed. To be clear, this is not
> a recovery mechanism; it simply means "if link enabling or disabling
> fails in CPU PM, do not continue to avoid potential hardware lockups.”
>
> The source save failure is a bit special, it fails at the beginning of
> the CoreSight CPU PM flow, and the returned error prevents the CPU idle
> flow from proceeding. As a result, there is nothing further to do.
>
>> Sashiko has a similar comment that this could result in
>> restoring uninitialised source save data later, but a comment in this
>> function about why the flow is like this would be helpful.
>
> The comment of "Restoring uninitialised source save data later" does not
> make sense.
>
> When the save fails, the notifier only rolls back the callbacks that ran
> before the failing callback by invoking CPU_PM_ENTER_FAILED. And there
> is no CPU_PM_EXIT, as the idle state is never entered.
>
Make sense.
>> We have coresight_disable_path_from_to() which always succeeds and doesn't
>> return an error. TRBE is the only sink with a pm_save_disable()
>> callback, but it always succeeds anyway.
>>
>> Would it not be much simpler to require that sink save/restore callbacks
>> always succeed and don't return anything? Seems like this percpu_pm_failed
>> stuff is extra complexity for a scenario that doesn't exist? The only thing
>> that can fail is saving the source but it doesn't goto sink_failed when that
>> happens.
>
> As you suggested, I can simplify the code with assuming sink ops
> always success, but we might expect complaints from Sashiko.
>
If we're keeping the exit value for sources and we're sharing the same
callback signature with sinks then it makes sense to keep the error
handling then. I was thinking it would be nice to remove it if we got
rid of both.
>> Ideally etm4_cpu_save() wouldn't have a return value either. It would be
>> good if we could find away to skip or ignore the timeouts in there somehow
>> because that's the only reason it can fail.
>
> Seems to me, ETMv4 has a timeout log and returns error is not bad, as
> it can be helpful to locate issue in a cheap way (sometimes, hang
> issue caused by lockup might be a time-consumed debugging).
>
> Thanks,
> Leo
More information about the linux-arm-kernel
mailing list