[PATCH v10 16/20] coresight: Add PM callbacks for sink device

Fri Apr 10 01:55:47 PDT 2026

On 09/04/2026 4:44 pm, Leo Yan wrote:
> On Thu, Apr 09, 2026 at 11:52:00AM +0100, James Clark wrote:
> 
> [...]
> 
>>> @@ -1787,15 +1808,32 @@ static int coresight_pm_save(struct coresight_path *path)
>>>    	to = list_prev_entry(coresight_path_last_node(path), link);
>>>    	coresight_disable_path_from_to(path, from, to);
>>> +	ret = coresight_pm_device_save(coresight_get_sink(path));
>>> +	if (ret)
>>> +		goto sink_failed;
>>> +
>>
>> The comment directly above this says "Up to the node before sink to avoid
>> latency". But then this line goes and saves the sink anyway. So I'm not sure
>> what's meant by the comment?
> 
> I will refine the comment to: "Control the path up to the node before
> the _system_ sink to avoid latency caused by memory copying; afterwards,
> saves context if the sink provides a callback".
> 
> 

But list_prev_entry(coresight_path_last_node(path)) doesn't do any thing 
special for system sinks, it always returns from source up to the sink 
regardless of the type. Shouldn't it be more like:

    /* Disable up to the node before sink */
    to = list_prev_entry(coresight_path_last_node(path), link);
    coresight_disable_path_from_to(path, from, to);

    /*
     * Save sink. Most sinks don't implement a save callback to avoid
     * latency caused by memory copying.
     */
    ret = coresight_pm_device_save(coresight_get_sink(path));

>>>    	return 0;
>>> +
>>> +sink_failed:
>>> +	if (!coresight_enable_path_from_to(path, coresight_get_mode(source),
>>> +					   from, to))
>>> +		coresight_pm_device_restore(source);
>>> +
>>> +	pr_err("Failed in coresight PM save on CPU%d: %d\n",
>>> +	       smp_processor_id(), ret);
>>> +	this_cpu_write(percpu_pm_failed, true);
>>
>> Why does only a failing sink set percpu_pm_failed when failing to save the
>> source exits early.
> 
> My purpose is to use "percpu_pm_failed == true" to prevent further
> issues if we already know the link has failed.  To be clear, this is not
> a recovery mechanism; it simply means "if link enabling or disabling
> fails in CPU PM, do not continue to avoid potential hardware lockups.”
> 
> The source save failure is a bit special, it fails at the beginning of
> the CoreSight CPU PM flow, and the returned error prevents the CPU idle
> flow from proceeding.  As a result, there is nothing further to do.
> 
>> Sashiko has a similar comment that this could result in
>> restoring uninitialised source save data later, but a comment in this
>> function about why the flow is like this would be helpful.
> 
> The comment of "Restoring uninitialised source save data later" does not
> make sense.
> 
> When the save fails, the notifier only rolls back the callbacks that ran
> before the failing callback by invoking CPU_PM_ENTER_FAILED.  And there
> is no CPU_PM_EXIT, as the idle state is never entered.
> 

Make sense.

>> We have coresight_disable_path_from_to() which always succeeds and doesn't
>> return an error. TRBE is the only sink with a pm_save_disable()
>> callback, but it always succeeds anyway.
>>
>> Would it not be much simpler to require that sink save/restore callbacks
>> always succeed and don't return anything? Seems like this percpu_pm_failed
>> stuff is extra complexity for a scenario that doesn't exist? The only thing
>> that can fail is saving the source but it doesn't goto sink_failed when that
>> happens.
> 
> As you suggested, I can simplify the code with assuming sink ops
> always success, but we might expect complaints from Sashiko.
> 

If we're keeping the exit value for sources and we're sharing the same 
callback signature with sinks then it makes sense to keep the error 
handling then. I was thinking it would be nice to remove it if we got 
rid of both.

>> Ideally etm4_cpu_save() wouldn't have a return value either. It would be
>> good if we could find away to skip or ignore the timeouts in there somehow
>> because that's the only reason it can fail.
> 
> Seems to me, ETMv4 has a timeout log and returns error is not bad, as
> it can be helpful to locate issue in a cheap way (sometimes, hang
> issue caused by lockup might be a time-consumed debugging).
> 
> Thanks,
> Leo