[PATCH 11/22] firmware: arm_scmi: Add SCMIv3.1 extended names protocols support

Cristian Marussi cristian.marussi at arm.com
Wed Jun 15 10:32:31 PDT 2022


On Wed, Jun 15, 2022 at 10:19:53AM -0700, Florian Fainelli wrote:
> On 6/15/22 09:29, Cristian Marussi wrote:
> > On Wed, Jun 15, 2022 at 09:10:03AM -0700, Florian Fainelli wrote:
> > > On 6/15/22 02:40, Cristian Marussi wrote:
> > > > On Wed, Jun 15, 2022 at 09:18:03AM +0100, Cristian Marussi wrote:
> > > > > On Wed, Jun 15, 2022 at 05:45:11AM +0200, Florian Fainelli wrote:
> > > > > > 
> > > > > > 
> > > > > > On 3/30/2022 5:05 PM, Cristian Marussi wrote:
> > > > > > > Using the common protocol helper implementation add support for all new
> > > > > > > SCMIv3.1 extended names commands related to all protocols with the
> > > > > > > exception of SENSOR_AXIS_GET_NAME.
> > > > > > > 
> > > > > > > Signed-off-by: Cristian Marussi <cristian.marussi at arm.com>
> > > > > > 
> > > > > > This causes the following splat on a platform where regulators fail to
> > > > > > initialize:
> > > > > > 
> > > > > 
> > > > > Hi Florian,
> > > > > 
> > > > > thanks for the report.
> > > > > 
> > > > > It seems a memory error while allocating so it was not meant to be
> > > > > solved by the fixes, anyway, I've never seen this splat in my testing
> > > > > and at first sight I cannot see anything wrong in the devm_k* calls
> > > > > inside scmi_voltage_protocol_init...is there any particular config in
> > > > > your setup ?
> > > > > 
> > > > > Moreover, the WARNING line 5402 seems to match v5.19-rc1 and it has
> > > > > slightly changed with -rc-1, so I'll try rebasing on that at first and
> > > > > see if I can reproduce the issue locally.
> > > > > 
> > > > 
> > > > I just re-tested the series rebased on v519-rc1 plus fixes and I cannot
> > > > reproduce in my setup with a few (~9) good and bad voltage domains.
> > > > 
> > > > How many voltage domains are advertised by the platform in your setup ?
> > > 
> > > There are 11 voltage regulators on this platform, and of course, now that I
> > > am trying to reproduce the splat I reported I just cannot anymore... I will
> > > let you know if there is anything that needs to be done. Thanks for being
> > > responsive as usual!
> > 
> > ... you're welcome...
> > 
> > I'm trying to figure out where an abnormal mem request could happen...
> 
> I think the problem is/was with the number of voltage domains being reported
> which was way too big and passed directly, without any clamping to
> devm_kcalloc() resulting the splat indicating that the allocation was beyond
> the MAX_ORDER. The specification allows for up to 2^16 domains which would
> still be way too much to allocate using kmalloc() so we could/should
> consider vmalloc() here eventually?
> 

Yes I was suspicious about the same and I was starting to think about vmalloc
in general across all domain enumerations even if this is may not be the case
for your splat...

> In all likelihood though we probably won't find a system with 65k voltage
> domains.
> 
> > 
> > can you try adding this (for brutal debugging) when you try ?
> > (...just to rule out funny fw replies.... :D)
> 
> Sure, nothing weird coming out and it succeeded in enumerating all of the
> regulators, I smell a transient issue with our firmware implementation,
> maybe...
> 
> [    0.560544] arm-scmi brcm_scmi at 0: num_returned:1  num_remaining:0
> [    0.560617] arm-scmi brcm_scmi at 0: num_returned:1  num_remaining:0
> [    0.560673] arm-scmi brcm_scmi at 0: num_returned:1  num_remaining:0
> [    0.560730] arm-scmi brcm_scmi at 0: num_returned:1  num_remaining:0
> [    0.560881] arm-scmi brcm_scmi at 0: num_returned:1  num_remaining:0
> [    0.560940] arm-scmi brcm_scmi at 0: num_returned:1  num_remaining:0
> [    0.560996] arm-scmi brcm_scmi at 0: num_returned:1  num_remaining:0
> [    0.561054] arm-scmi brcm_scmi at 0: num_returned:1  num_remaining:0
> [    0.561110] arm-scmi brcm_scmi at 0: num_returned:1  num_remaining:0
> [    0.561168] arm-scmi brcm_scmi at 0: num_returned:1  num_remaining:0
> [    0.561225] arm-scmi brcm_scmi at 0: num_returned:1  num_remaining:0
> [    0.561652] scmi-regulator scmi_dev.2: Regulator stb_vreg_2 registered
> for domain [2]
> [    0.561858] scmi-regulator scmi_dev.2: Regulator stb_vreg_3 registered
> for domain [3]
> [    0.562030] scmi-regulator scmi_dev.2: Regulator stb_vreg_4 registered
> for domain [4]
> [    0.562190] scmi-regulator scmi_dev.2: Regulator stb_vreg_5 registered
> for domain [5]
> [    0.564427] scmi-regulator scmi_dev.2: Regulator stb_vreg_6 registered
> for domain [6]
> [    0.564638] scmi-regulator scmi_dev.2: Regulator stb_vreg_7 registered
> for domain [7]
> [    0.564817] scmi-regulator scmi_dev.2: Regulator stb_vreg_8 registered
> for domain [8]
> [    0.565030] scmi-regulator scmi_dev.2: Regulator stb_vreg_9 registered
> for domain [9]
> [    0.565191] scmi-regulator scmi_dev.2: Regulator stb_vreg_10 registered
> for domain [10]
> 

Ok, this was another place where a reply carrying a consistent number of
non-segmented entries could trigger an jumbo-request to kmalloc...

..maybe this is where a transient fw issue can reply with a dramatic
numbers of possible (non-segmented) levels (num_returned+num_remaining)

(I filter out replies carrying descriptors for segmented voltage-levels
that does not come in triplets (returned:3  remaining:0) since they are out
of spec...but I just hit today on another fw sending such out of spec
reply for clocks and I will relax such req probably not to break too
many out-of-spec blobs out in the wild :P)

So let me know if got new splats...

Thanks,
Cristian




More information about the linux-arm-kernel mailing list