[PATCH v4 06/20] coresight: add try_get_module() in coresight_grab_device()

Tingwei Zhang tingweiz at codeaurora.org
Tue Jul 28 07:17:04 EDT 2020


Hi Mike,
On Tue, Jul 28, 2020 at 06:08:37PM +0800, Mike Leach wrote:
> Hi Tingwei,
> 
> On Tue, 28 Jul 2020 at 02:07, Tingwei Zhang <tingweiz at codeaurora.org> wrote:
> >
> > Hi Mike,
> >
> > On Tue, Jul 28, 2020 at 01:04:21AM +0800, Mike Leach wrote:
> > > Hi Tingwei,
> > >
> > > On Sun, 26 Jul 2020 at 02:32, Tingwei Zhang <tingweiz at codeaurora.org>
> > > wrote:
> > > >
> > > > Hi Mike,
> > > > On Sat, Jul 25, 2020 at 09:51:31PM +0800, Mike Leach wrote:
> > > > > Hi Tingwei,
> > > > >
> > > > > On Sat, 25 Jul 2020 at 11:05, Tingwei Zhang 
> > > > > <tingweiz at codeaurora.org>
> > > > > wrote:
> > > > > >
> > > > > > Hi Mike,
> > > > > >
> > > > > >  Thu, Jul 23, 2020 at 06:55:47PM +0800, Mike Leach wrote:
> > > > > > > Hi Tingwei,
> > > > > > >
> > > > > > > On Thu, 23 Jul 2020 at 05:28, Tingwei Zhang
> > > <tingwei at codeaurora.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > When coresight device is in an active session, driver module 
> > > > > > > > of
> > > > > > > > that device should not be removed. Use try_get_module() in
> > > > > > > > coresight_grab_device() to prevent module to be unloaded.
> > > > > > > >
> > > > > > > > Signed-off-by: Tingwei Zhang <tingwei at codeaurora.org>
> > > > > > > > ---
> > > > > > > >  drivers/hwtracing/coresight/coresight.c | 27
> > > > > +++++++++++++++++++++----
> > > > > > > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/hwtracing/coresight/coresight.c
> > > > > > > b/drivers/hwtracing/coresight/coresight.c
> > > > > > > > index b7151c5f81b1..17bc76ea86ae 100644
> > > > > > > > --- a/drivers/hwtracing/coresight/coresight.c
> > > > > > > > +++ b/drivers/hwtracing/coresight/coresight.c
> > > > > > > > @@ -640,7 +640,7 @@ struct coresight_device
> > > > > > > *coresight_get_sink_by_id(u32 id)
> > > > > > > >   * don't appear on the trace path, they should be handled 
> > > > > > > > along
> > > > > with
> > > > > > > the
> > > > > > > >   * the master device.
> > > > > > > >   */
> > > > > > > > -static void coresight_grab_device(struct coresight_device
> > > *csdev)
> > > > > > > > +static int coresight_grab_device(struct coresight_device
> > > *csdev)
> > > > > > > >  {
> > > > > > > >         int i;
> > > > > > > >
> > > > > > > > @@ -648,10 +648,25 @@ static void coresight_grab_device(struct
> > > > > > > coresight_device *csdev)
> > > > > > > >                 struct coresight_device *child;
> > > > > > > >
> > > > > > > >                 child  = csdev->pdata->conns[i].child_dev;
> > > > > > > > -               if (child && child->type ==
> > > > > CORESIGHT_DEV_TYPE_HELPER)
> > > > > > > > +               if (child && child->type ==
> > > > > CORESIGHT_DEV_TYPE_HELPER) {
> > > > > > > > +                       if
> > > > > > > (!try_module_get(child->dev.parent->driver->owner))
> > > > > > > > +                               goto err;
> > > > > > > > 
> > > > > > > > pm_runtime_get_sync(child->dev.parent);
> > > > > > > > +               }
> > > > > > > >         }
> > > > > > > > +       if (!try_module_get(csdev->dev.parent->driver->owner))
> > > > > > > > +               goto err;
> > > > > > > >         pm_runtime_get_sync(csdev->dev.parent);
> > > > > > > > +       return 0;
> > > > > > > > +err:
> > > > > > > > +       for (i--; i >= 0; i--) {
> > > > > > > > +               struct coresight_device *child;
> > > > > > > > +
> > > > > > > > +               child  = csdev->pdata->conns[i].child_dev;
> > > > > > > > +               if (child && child->type ==
> > > > > CORESIGHT_DEV_TYPE_HELPER)
> > > > > > > > +                       pm_runtime_put(child->dev.parent);
> > > > > > > > +       }
> > > > > > > > +       return -ENODEV;
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  /*
> > > > > > > > @@ -663,12 +678,15 @@ static void coresight_drop_device(struct
> > > > > > > coresight_device *csdev)
> > > > > > > >         int i;
> > > > > > > >
> > > > > > > >         pm_runtime_put(csdev->dev.parent);
> > > > > > > > +       module_put(csdev->dev.parent->driver->owner);
> > > > > > > >         for (i = 0; i < csdev->pdata->nr_outport; i++) {
> > > > > > > >                 struct coresight_device *child;
> > > > > > > >
> > > > > > > >                 child  = csdev->pdata->conns[i].child_dev;
> > > > > > > > -               if (child && child->type ==
> > > > > CORESIGHT_DEV_TYPE_HELPER)
> > > > > > > > +               if (child && child->type ==
> > > > > CORESIGHT_DEV_TYPE_HELPER) {
> > > > > > > >                         pm_runtime_put(child->dev.parent);
> > > > > > > > +
> > > > > module_put(child->dev.parent->driver->owner);
> > > > > > > > +               }
> > > > > > > >         }
> > > > > > > >  }
> > > > > > > >
> > > > > > > > @@ -721,7 +739,8 @@ static int _coresight_build_path(struct
> > > > > > > coresight_device *csdev,
> > > > > > > >         if (!node)
> > > > > > > >                 return -ENOMEM;
> > > > > > > >
> > > > > > > > -       coresight_grab_device(csdev);
> > > > > > > > +       if (coresight_grab_device(csdev))
> > > > > > > > +               return -ENODEV;
> > > > > > > >         node->csdev = csdev;
> > > > > > > >         list_add(&node->link, path);
> > > > > > > >
> > > > > > > > --
> > > > > > > > The Qualcomm Innovation Center, Inc. is a member of the Code
> > > Aurora
> > > > > > > Forum,
> > > > > > > > a Linux Foundation Collaborative Project
> > > > > > > >
> > > > > > >
> > > > > > > The CTI devices are associated with other coresight components
> > > using
> > > > > > > csdev->ect_dev; These are not handled in the main linear trace
> > > path as
> > > > > > > these have a star topology interlinking all components.
> > > > > > > However, when a component has an associated ect_dev then is
> > > enabled at
> > > > > > > the same time as the other component is enabled.
> > > > > > >
> > > > > > > So a module_get/put will be needed for this association to 
> > > > > > > prevent
> > > the
> > > > > > > CTI being removed while a trace session is active.
> > > > > > > I suggest in coresight.c : coresight_control_assoc_ectdev()
> > > > > > >
> > > > > >
> > > > > > In module unload process, devices of that module will be removed. 
> > > > > > In
> > > > > > this case, cti_remove() is called on all cti device when unloading
> > > > > > cti module. All cti connections is cleaned at that time.
> > > > > > csdev->ect_dev is set to NULL. coresight_control_assoc_ectdev()
> > > > > > will return since since ect_csdev is NULL.
> > > > > >
> > > > > > I think we are safe without holding module reference since cti
> > > > > > driver has done a pretty good job on clean up.
> > > > > >
> > > > >
> > > > > The issue here is not about clean up. We need to keep all the
> > > > > programmed coresight modules loaded and operational while a 
> > > > > coresight
> > > > > trace session is in progress. The CTI can be used to control the
> > > > > generation of trace, trigger trace related events etc.
> > > > > If the module is removed while a session is in progress then this
> > > > > programming is lost and the trace data collected may not be what is
> > > > > expected.
> > > > >
> > > >
> > > > Got your point now. In my opinion, CTI is kinda different to other
> > > > coresight components since it's optional.
> > > >
> > > > For other components, they have to been there when path is built or
> > > > the enablement fails. Use can either get a successfully return which
> > > > means it's good and all devices/modules are there until path is
> > > > released, or a failed return which means trace session can't be
> > > > setup.
> > > >
> > >
> > > The module get/put do more than ensure that the system can be enabled.
> > > They ensure that no components in the coresight path can be unloaded
> > > until all paths in use are released. This ensures that clients do not
> > > get the trace system pulled out from underneath - corrupting trace
> > > data, and possibly affecting the client program.
> > >
> >
> > I agree.
> >
> > > > In CTI case, there's no garrantee that CTI related to the trace
> > > > session is there even the enablement is successful.
> > >
> > > And there is no guarantee that it is not programmed to perform a key
> > > task in controlling the generation of trace.
> > >
> > > The entire programmed coresight system must be protected when in use.
> > > That includes CTI devices. Otherwise the integrity of the captured
> > > trace could be compromised, and indeed there may be no trace captured
> > > if this is relying on CTI triggers.
> > >
> >
> > That's correct.  If CTI is not there, the trace may be wrong or no
> > trace as all.
> >
> > > >For example,
> > > > One cti is required for ETM tracing session. If cti module is
> > > > unloaded before enable that trace session. The enablement will
> > > > still be successfully done since there's no ect_dev connected
> > > > at all.
> > > >
> > > > My point is holding cti module reference in enable path won't
> > > > fix all the problem. Altenatively, user should be aware that
> > > > unload cti module leads to all cti functioniol failure.
> > > > With current design, the behavior is consistant on cti. User
> > > > can unload cti module whenever he wants but it will break
> > > > the function.
> > >
> > > Precisely - and this is what should never be allowed - why can the CTI
> > > be used to break the trace system but not other components?
> > >
> >
> 
> Sorry - I may not have been clear here - I do not want to make having
> a CTI mandatory for enabling the trace session - this is an optional
> component.
>
> > We can't prevent user to unload CTI module when there's no active
> > trace session.
> 
> Agreed - the same applies to all the other components.
> 
> > If we don't allow CTI to be not available, we need to
> > check whether CTI is there in build_path routine or coresight enable
> > routine. We need return failure if CTI is not there when user enables
> > the trace session. Currently, coresight framework check associate CTI
> > device, but if CTI deivce is not probed due to probe failure or
> > module unload, coresight framework assume there's no assocaite CTI
> > device and just continue the enablement.  It won't return failure.
> >
> > If we have plan to change CTI check to be mandatory, make the coresight
> > enablement fail if CTI is not there like other trace components.  I'm
> > perfect fine to add module_get/module_put for CTI.
> >
> 
> If the CTI is present at the time the trace path is created, then it
> must be enabled, and a failure to enable the CTI fails the entire path
> enable - this is the operation that is implemented by
> coresight_control_assoc_ectdev() at this time.
> If a CTI is not present at the time the trace path is created, then as
> you have said previously, the ect_dev pointer will be NULL and the
> check and enable of CTI will be skipped by
> coresight_control_assoc_ectdev() and have no effect on the trace path.
> 
> For the module version, if CTI is present and enabled then the module
> must be locked so it cannot be removed while the trace is active -
> this is the only change that is required. So adding get / put module
> in coresight_control_assoc_ectdev() will lock the module in for the
> duration that trace is active.
> 
> I've tested the code below to ensure that the cti module is correctly
> locked while trace is active, but can be removed as required when
> trace is inactive.
>

Thanks a lot for coming up with the patch and testing it, Mike. 
 
> Regards
> 
> Mike
> 
> ---------------------------------------------------------------------------------
> static int
> coresight_control_assoc_ectdev(struct coresight_device *csdev, bool enable)
> {
>     int ect_ret = 0;
>     struct coresight_device *ect_csdev = csdev->ect_dev;
>     struct module *mod;
> 
>     if (!ect_csdev)
>         return 0;
> 
>     if ((!ect_ops(ect_csdev)->enable) || (!ect_ops(ect_csdev)->disable))
>         return 0;
> 
>     mod = ect_csdev->dev.parent->driver->owner;
> 
>     if (enable) {
>         if (try_module_get(mod)) {
>             ect_ret = ect_ops(ect_csdev)->enable(ect_csdev);
>             if (ect_ret)
>                 module_put(mod);
>         } else
>             ect_ret = -ENODEV;
>     } else {
>         module_put(mod);
>         ect_ret = ect_ops(ect_csdev)->disable(ect_csdev);
>     }

If CTI module is removed before enablement of ETM and installed after ETM
enablement and before ETM disablement. We have a unbalanced module
reference number here. I made one test to prove this with above code.

We may need move module_put() after disable(). Return proper error from
disable() to indicate CTI was not enabled before so we can skip
module_put().

Tests:
db845c:/sys/bus/coresight/devices # echo 1 > tmc_etr0/enable_sink
db845c:/sys/bus/coresight/devices # echo 1 > etm1/enable_source
db845c:/sys/bus/coresight/devices # rmmod coresight_cti
rmmod: failed to unload coresight_cti: Try again
1|db845c:/sys/bus/coresight/devices # echo 0 > etm1/enable_source
db845c:/sys/bus/coresight/devices # rmmod coresight_cti
db845c:/sys/bus/coresight/devices # echo 1 > etm1/enable_source
db845c:/sys/bus/coresight/devices # insmod /data/modules/coresight-cti.ko
db845c:/sys/bus/coresight/devices # echo 1 > etm2/enable_source
db845c:/sys/bus/coresight/devices # echo 0 > etm1/enable_source
db845c:/sys/bus/coresight/devices # rmmod coresight_cti
db845c:/sys/bus/coresight/devices # 

Above module removal should fail since etm2 is still active now.

> 
>     /* output warning if ECT enable is preventing trace operation */
>     if (ect_ret)
>         dev_info(&csdev->dev, "Associated ECT device (%s) %s failed\n",
>              dev_name(&ect_csdev->dev),
>              enable ? "enable" : "disable");
>     return ect_ret;
> }
> -----------------------------------------------------------------------------------------
> Tests:-
> -----------------------
> root at linaro-developer:/home/linaro/scripts# ./trace-sysfs-start.bash
> Tracing via sysfs
> enabling etf
> wait before enable etm
> enabling etm 0.
> enabling etm 1.
> enabling etm 2.
> enabling etm 3.
> waiting for trace
> root at linaro-developer:/home/linaro/scripts# rmmod 
> ../cs-mods/coresight-etm4x.ko
> rmmod: ERROR: Module coresight_etm4x is in use
> root at linaro-developer:/home/linaro/scripts# rmmod 
> ../cs-mods/coresight-cti.ko
> rmmod: ERROR: Module coresight_cti is in use
> root at linaro-developer:/home/linaro/scripts# ./trace-sysfs-end.bash
> ending trace
> disabling etm 0.
> disabling etm 1.
> disabling etm 2.
> disabling etm 3.
> disabling etf
> 16+0 records in
> 16+0 records out
> 8192 bytes (8.2 kB) copied, 0.000834116 s, 9.8 MB/s
> root at linaro-developer:/home/linaro/scripts# rmmod 
> ../cs-mods/coresight-cti.ko
> root at linaro-developer:/home/linaro/scripts# rmmod 
> ../cs-mods/coresight-etm4x.ko
> root at linaro-developer:/home/linaro/scripts# insmod 
> ../cs-mods/coresight-etm4x.ko
> root at linaro-developer:/home/linaro/scripts# ./trace-sysfs-start.bash
> Tracing via sysfs
> enabling etf
> wait before enable etm
> enabling etm 0.
> enabling etm 1.
> enabling etm 2.
> enabling etm 3.
> waiting for trace
> root at linaro-developer:/home/linaro/scripts# rmmod 
> ../cs-mods/coresight-etm4x.ko
> rmmod: ERROR: Module coresight_etm4x is in use
> root at linaro-developer:/home/linaro/scripts# rmmod 
> ../cs-mods/coresight-cti.ko
> rmmod: ERROR: Module coresight_cti is not currently loaded
> root at linaro-developer:/home/linaro/scripts# ./trace-sysfs-end.bash
> ending trace
> disabling etm 0.
> disabling etm 1.
> disabling etm 2.
> disabling etm 3.
> disabling etf
> 16+0 records in
> 16+0 records out
> 8192 bytes (8.2 kB) copied, 0.000819117 s, 10.0 MB/s
> 
> }
> 
> 
> 
> > > Moving forwards we will see users who simply use pre-programmed
> > > settings from client programs such as perf. It may be that some users
> > > do not have the credentials to add and remove modules - this could be
> > > an admin function where larger systems with multiple sessions are in
> > > use.
> > >
> > >
> > > Regards
> > >
> > > Mike
> > >
> > >
> > > >
> > > > Thanks, Tingwei
> > > >
> > > > > Regards
> > > > >
> > > > > Mike
> > > > >
> > > > > > Let me know if you still think we need hold the module reference.
> > > > > >
> > > > > > Thanks, Tingwei
> > > > > > > Regards
> > > > > > >
> > > > > > > Mike
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Mike Leach
> > > > > > > Principal Engineer, ARM Ltd.
> > > > > > > Manchester Design Centre. UK
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > linux-arm-kernel mailing list
> > > > > > > linux-arm-kernel at lists.infradead.org
> > > > > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Mike Leach
> > > > > Principal Engineer, ARM Ltd.
> > > > > Manchester Design Centre. UK
> > > > >
> > > > > _______________________________________________
> > > > > linux-arm-kernel mailing list
> > > > > linux-arm-kernel at lists.infradead.org
> > > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > >
> > >
> > >
> > > --
> > > Mike Leach
> > > Principal Engineer, ARM Ltd.
> > > Manchester Design Centre. UK
> > >
> > > _______________________________________________
> > > linux-arm-kernel mailing list
> > > linux-arm-kernel at lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> 
> 
> --
> Mike Leach
> Principal Engineer, ARM Ltd.
> Manchester Design Centre. UK



More information about the linux-arm-kernel mailing list