[PATCH V4 0/5] arm_scmi: vendors: Qualcomm Generic Vendor Extensions

Sibi Sankar quic_sibis at quicinc.com
Mon Dec 23 06:09:01 PST 2024



On 12/17/24 20:15, Cristian Marussi wrote:
> On Tue, Dec 17, 2024 at 05:55:35PM +0530, Sibi Sankar wrote:
>>
>>
>> On 12/5/24 22:31, Sudeep Holla wrote:
>>> On Fri, Nov 22, 2024 at 09:37:47AM +0100, Johan Hovold wrote:
>>>> On Thu, Nov 14, 2024 at 09:52:12AM +0530, Sibi Sankar wrote:
>>>>> On 11/8/24 20:44, Johan Hovold wrote:
>>>>
>>>>>>> On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:
>>>>
>>>>>>>> Second, after loading the protocol and client drivers manually (in that
>>>>>>>> order, shouldn't the client driver pull in the protocol?), I got:
>>>>>>>>
>>>>>>>> 	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
>>>>>>>> 	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
>>>>>>>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
>>>>>>>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
>>>>>>>>
>>>>>>>> which seems to suggest that the firmware on my CRD does not support this
>>>>>>>> feature. Is that the way this should be interpreted? And does that mean
>>>>>>>> that non of the commercial laptops supports this either?
>>>>
>>>>>> Yeah, hopefully Sibi can shed some light on this. I'm using the DT
>>>>>> patch (5/5) from this series, which according to the commit message is
>>>>>> supposed to enable bus scaling on the x1e80100 platform. So I guess
>>>>>> something is missing in my firmware.
>>>>>
>>>>> Nah, it's probably just because of the algo string used.
>>>>> The past few series used caps MEMLAT string instead of
>>>>> memlat to pass the tuneables, looks like all the laptops
>>>>> havn't really switched to it yet. Will revert back to
>>>>> using to lower case memlat so that all devices are
>>>>> supported. Thanks for trying the series out!
>>>>
>>>> I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
>>>> there too, and there I do *not* see the above mentioned -EOPNOSUPP error
>>>> and the memlat driver probes successfully.
>>>>
>>>> On the other hand, this series seems to have no effect on a kernel
>>>> compilation benchmark. Is that expected?
>>>>
>>>
>>> Hijacking this thread to rant about state of firmware implementation on
>>> this platform that gives me zero confidence in merging any of these without
>>> examining each of the interface details in depth and at lengths.
>>>
>>
> 
> Hi Sibi,
> 
>> Hey Sudeep,
>>
>> Thanks for taking time to review the series.
>>
>>> Also I see the standard protocol like PERF seem to have so many issues which
>>> adds to my no confidence. I can't comment on that thread for specific reasons.
>>
>> ^^ is largely untrue, a lot of finger pointing and a gross
>> misrepresentation of reality :/
>>
>> The only major problem that X1E perf protocol has is a firmware
>> crash in the LEVEL_GET regular message implementation. This
>> pretty much went unnoticed because of messaging in perf implementation
>> in kernel. Given the fastchannel implementation isn't mandatory
>> according to spec, the kernel clearly says it switches to
>> regular messaging when it clearly doesn't do that and uses
>> stale data structures instead. This ensured that level get regular
>> messaging never got tested.
> 
> You claimed this a couple of times here and on IRC, but sincerely,
> looking at the fastchannel implementation in SCMI core and Perf, I could
> not track down where this could have happened in the recent code
> (i.e. with or without your recent, welcomed, patches...)
> 
> When FC initialization fails and bailout it says:
> 	
> 	"Failed to get FC for protocol %X [MSG_ID:%u / RES_ID:%u] - ret:%d. Using regular messaging."
> 
> ... and it clears any gathered address for that FC, so that in __scmi_perf_level_get()
> you end up skipping the FC machinery and use messaging
> 
> 	if (dom->fc_info && dom->fc_info[PERF_FC_LEVEL].get_addr) {
> 		...
> 	}
> 
> 	return scmi_perf_msg_level_get(ph, dom->id, level, poll);
> 
> Now this is done ONLY for the FC that specifically failed
> initialization, i.e. identified by the tuple PROTO_ID/MSG_ID/RES_ID
> (as stated in the noisy message above where MSG_ID is specified) NOT for
> all Fastchannel, so you can have an FC successfully initialized only on
> the GET but failing in the SET, so only the GET FC will be used.
> 
> I dont really understand how the Kernel was misbehaving and using
> instead stale data, neither, if this was the case, I can see where this
> issue would have been fixed.
> 
> To be clear, I am not really interested in throwing an argument here, but
> I sincerely dont see where the alleged problem was and how was fixed (kernel
> side), so I fear it could be still there, hidden maybe by a change in the
> platform fw.
> 
> Apologies if I missed something along the history of this..

lol, this is pretty embarrassing :|, It's just like you said
looks like this fw supports get_level fastchannel but fails
to say it supports it. This was the reason behind get_level
regular message for being never tested and being buggy and
had nothing to do the kernel messaging or being buggy.
My bad :(, sry again.

-Sibi

> 
> Thanks,
> Cristian



More information about the linux-arm-kernel mailing list