[PATCH v2 08/13] firmware: arm_scmi: Harden clock protocol initialization
Cristian Marussi
cristian.marussi at arm.com
Fri Apr 24 06:32:05 PDT 2026
On Fri, Apr 24, 2026 at 02:07:59PM +0200, Nicolas Frattaroli wrote:
> On Tuesday, 10 March 2026 19:40:25 Central European Summer Time Cristian Marussi wrote:
> > Add proper error handling on failure to enumerate clocks features or
> > rates.
> >
Hi,
> > Signed-off-by: Cristian Marussi <cristian.marussi at arm.com>
> > ---
> > drivers/firmware/arm_scmi/clock.c | 22 ++++++++++++++++------
> > 1 file changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/firmware/arm_scmi/clock.c b/drivers/firmware/arm_scmi/clock.c
> > index c9b62edce4fd..bf956305a8fe 100644
> > --- a/drivers/firmware/arm_scmi/clock.c
> > +++ b/drivers/firmware/arm_scmi/clock.c
> > @@ -402,10 +402,16 @@ static int scmi_clock_attributes_get(const struct scmi_protocol_handle *ph,
> > SUPPORTS_RATE_CHANGE_REQUESTED_NOTIF(attributes))
> > clk->rate_change_requested_notifications = true;
> > if (PROTOCOL_REV_MAJOR(ph->version) >= 0x3) {
> > - if (SUPPORTS_PARENT_CLOCK(attributes))
> > - scmi_clock_possible_parents(ph, clk_id, cinfo);
> > - if (SUPPORTS_GET_PERMISSIONS(attributes))
> > - scmi_clock_get_permissions(ph, clk_id, clk);
> > + if (SUPPORTS_PARENT_CLOCK(attributes)) {
> > + ret = scmi_clock_possible_parents(ph, clk_id, cinfo);
> > + if (ret)
> > + return ret;
> > + }
> > + if (SUPPORTS_GET_PERMISSIONS(attributes)) {
> > + ret = scmi_clock_get_permissions(ph, clk_id, clk);
> > + if (ret)
> > + return ret;
> > + }
> > if (SUPPORTS_EXTENDED_CONFIG(attributes))
> > clk->extended_config = true;
> > }
> > @@ -1143,8 +1149,12 @@ static int scmi_clock_protocol_init(const struct scmi_protocol_handle *ph)
> > for (clkid = 0; clkid < cinfo->num_clocks; clkid++) {
> > cinfo->clkds[clkid].id = clkid;
> > ret = scmi_clock_attributes_get(ph, clkid, cinfo);
> > - if (!ret)
> > - scmi_clock_describe_rates_get(ph, clkid, cinfo);
> > + if (ret)
> > + return ret;
> > +
> > + ret = scmi_clock_describe_rates_get(ph, clkid, cinfo);
> > + if (ret)
> > + return ret;
> > }
> >
> > if (PROTOCOL_REV_MAJOR(ph->version) >= 0x3) {
> >
>
> I see that a quirk is being added for this, but I thought I should chime
> in with my opinion for future approaches in this direction.
>
> I don't see how this hardens anything. All this does is break platforms
> that were previously working by returning early. At most, this should
Certainly the naming in the subject was chosen badly (by me!)...indeed it
should be more something like "Enforce strict protocol compliance",
because at the end all of the broken platforms really run a slighly odd
out of spec SCMI firmware that does NOT implement one or more of the SCMI
mandatory command...
> be a warning (as in not WARN but pr_warn/dev_warn/...). If firmware
> returns nonsense, a clock driver should imho try its best to work
> around the nonsense in a safe way, because the alternative is that
> a major part of the system (and thus likely the entire system) no
..well yes we definitely dont want to break deployed platforms BUT also
we dont want to legalize this kind of out of spec behaviour in future
firmwares...hence (a number ?) of quirks an FW_BUG warns probably to
let already broken deployed platforms survive while discouraging such
implementation in future fw implementations...
These firmware most certainly wont pass the SCMI compliance test suite [1],
which indeed we do not mandate, but the reason these bugs happened is
exactly because the kernel SCMI stack was buggy and left that door open...
More specifically these kind of out-of-spec behaviours are not really just
a matter being 'picky', the problem is that any resource set in any
SCMI protocol is defined by the spec such as to be described by a
contiguos set of IDs and the drivers are designed anyway under that
assumption from the allocation point of view, so allowing a clock ID to
just fail one of the mandatory commands and skip a domain would jeopardize
all of this and, even if clearly is NOT a problem here, seems a fragile
assumption.
> longer works. It's basically the same reason why we avoid BUG(): sure,
> you prevented a problem, but you tore down the entire system to tell
> the user about it.
No I feel this is a lot different from the BUG() scenario: we'll never
merge this like this without enough quirks to let survive all the
existing impacted platforms, but the shout here is for the fw-developer
of a new un-deployed firmware: that WILL fail and no quirk will be accepted
if they plan to design a future FW out of spec from scratch because they
think is better....well...they can anyway of course but they will have
to keep their own quirks donwstream forever :P
Thanks,
Cristian
[1]: https://gitlab.arm.com/tests/scmi-tests/-/commits/master?ref_type=heads
More information about the linux-arm-kernel
mailing list