resource leak in firmware/arm_scmi driver

Cristian Marussi cristian.marussi at arm.com
Thu Oct 13 04:48:03 PDT 2022


On Mon, Oct 10, 2022 at 10:45:45AM +0200, Uwe Kleine-König wrote:
> Hello,
> 

Hi Uwe,

thanks for reporting this.

> during some janitorial cleanup I stumbled over a resource leak in
> drivers/firmware/arm_scmi/driver.c.
> 
> The problem is as follows:
> 
> scmi_remove() might return early if info->users is non-zero. The driver
> core however ignores the return value of scmi_remove() and removes the
> device and frees the devm-allocated resources (e.g. *info).
> 
> So letting aside that some resources are never freed after a failed call
> of scmi_remove(), the user of the scmi node will probably stumble over
> accessing freed memory soon. I wouldn't be surprised if that was
> exploitable.
> 

At first, I thought that this was only some kind of mess that needed to
be cleaned up in the remove/exit path but without any real way of being
exposed: this *bad* assumption of mine came from the fact that the SCMI
drivers (like scmi-cpufreq scmi-hwmon) register to use the core SCMI stack
services using an exported symbol (scmi_register_driver) and then, additionally,
each SCMI driver willing to use some the core SCMI protocols issues a
try_module_get (since the protocols could have been modularized)

So that in a module scenario (with scmi-module being the core platform
driver:

root at deb-buster-arm64:~# lsmod 
Module                  Size  Used by
scmi_hwmon             16384  0
scmi_module           131072  2 scmi_hwmon

root at deb-buster-arm64:~# rmmod ./scmi-module.ko 
rmmod: ERROR: Module scmi_module is in use by: scmi_hwmon

BUT, then I realized that all of this refcounting is not going to save me
from an explicitly triggered unbind:

root at deb-buster-arm64:~# echo 'firmware:scmi' > /sys/bus/platform/drivers/arm-scmi/unbind 
[11601.798671] arm-scmi firmware:scmi: remove callback returned a non-zero value. This will be ignored.

This will indeed expose the issue you reported.

> I quickly tried to fix this issue, but didn't understand the driver good
> enough. I think a fix would involve adding a get_device() call to
> scmi_handle_get() to prevent scmi_remove() being called while a handle
> exists.
> 

I tried to address the issue with the get_device() calls as you
suggested, but it has really no effect and in fact looking at the code
on the remove path there is nothing that seems to be taken into
consideration to stop a removal (and this indeed is consistent with the
fact that any error returned on the removal/exit path is effectively ignored).

What instead DOES seem to work, it is to add a proper devlink with
device_link_add() declaring any SCMI driver device (like from cpufreq)
as a consumer of the core stack platform device: this will NOT stop the
removal, BUT it does trigger an implicit unbind at first of all the SCMI
drivers currently loaded when an unbind is requested for the core SCMI
platform device: as a consequence the removal can now be triggered
safely by an unbind even if some SCMI driver users are present (they
will be unbound too)

Now, for this cleanup to fully work, some (long due) rework of the SCMI
core devices handling and removal is needed (and it has partially
already started in other unrelated series), so my plan would be to, at
first, immediately add a '.suppress_bind_attrs = true' to the core driver
to mitigate this (which is also easily portable to stable) and then
rework properly the core SCMI removeal routines to safely re-enable
a proper unbind. (or not re-enabling it if deemed not worth BUT anyway
reworking the core SCMI remove/exit path)

I'll check my plan with Sudeep, any thoughts from you about this ?
Am I missing something more ?

Thanks a lot.

Regards,
Cristian



More information about the linux-arm-kernel mailing list