[PATCH] coresight: Add mode check when enable/disable csdev source

James Clark james.clark at arm.com
Thu Sep 21 08:36:04 PDT 2023



On 21/09/2023 14:29, Junhao He wrote:
> The commmit [1] replace "source_ops(csdev)::enable()" with
> coresight_enable_source(). After this patch, when enable ETM through
> perf mode, the csdev::enable will be set to true. Then if we to disable
> the ETM by sysfs mode at the time, the resource will be released
> incorrectly, kernel will report the following call trace.
> 
> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
> echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink
> echo 1 > /sys/bus/coresight/devices/etm0/enable_source
> echo 0 > /sys/bus/coresight/devices/etm0/enable_source
> Unable to handle kernel NULL pointer dereference at virtual
> address 00000000000001d0
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> ...
> Call trace:
>  etm4_disable+0x54/0x150 [coresight_etm4x]
>  coresight_disable_source+0x6c/0x98 [coresight]
>  coresight_disable+0x74/0x1c0 [coresight]
>  enable_source_store+0x88/0xa0 [coresight]
>  dev_attr_store+0x20/0x40
>  sysfs_kf_write+0x4c/0x68
>  kernfs_fop_write_iter+0x120/0x1b8
>  vfs_write+0x2dc/0x3b0
>  ksys_write+0x70/0x108
>  __arm64_sys_write+0x24/0x38
>  invoke_syscall+0x50/0x128
>  el0_svc_common.constprop.0+0x104/0x130
>  do_el0_svc+0x40/0xb8
>  el0_svc+0x2c/0xb8
>  el0t_64_sync_handler+0xc0/0xc8
>  el0t_64_sync+0x1a4/0x1a8
> Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5)
> ---[ end trace 0000000000000000 ]---
> 
> The csdev::enable specify the device is currently part of an active
> path, when enabling source via perf mode, csdev::enable also should
> be set to true. We can add mode check in etm4_disable() to fix that,
> if it's done, the sysfs cannot report busy when sysfs to enable ETM
> that has been enabled by perf.
> 
> Struct coresight_device add member of mode to check device busy in
> coresight_(enable/disable)_source function to fixes handle kernel
> NULL pointer, and sysfs report busy if perf session is already using
> the ETM.
> By the way, inperhaps another cleanup patch may be upload to remove
> the etmv4_drvdata::mode tmc_drvdata::mode or others.
> 

Hi Junhao,

I definitely think we should do the cleanup at the same time as this
change. Otherwise we risk leaving in the other modes which are
duplicates of this and could easily cause confusion.

Although I saw there are a couple of modes that are not enum cs_mode,
but a different kind of mode. And it's not always obvious because enum
cs_mode is sometimes stored as an int...

> Test:
> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
> cat /sys/bus/coresight/devices/etm2/enable_source
> 1
> echo 1 > /sys/bus/coresight/devices/etm2/enable_source
> -bash: echo: write error: Device or resource busy
> echo 0 > /sys/bus/coresight/devices/etm2/enable_source
> coresight etm2: Someone is already using the tracer
> cat /sys/bus/coresight/devices/etm2/enable_source
> 1
> 

I tested the fix and it works, although I have a few comments below on
whether we should do something else instead.

> [1] "coresight: Enable and disable helper devices adjacent to the path"> Fixes: 6148652807ba ("coresight: Enable and disable helper devices
adjacent to the path")

What was the original behavior before that change? Maybe it didn't have
exactly the same error but was it still possible to enable and disable a
sink after starting a Perf session? Seems like on the disable the same
thing would have happened or something else bad?

> Signed-off-by: Junhao He <hejunhao3 at huawei.com>
> ---
>  drivers/hwtracing/coresight/coresight-core.c   | 18 ++++++++++++++++--
>  .../hwtracing/coresight/coresight-etm-perf.c   |  6 +++++-
>  drivers/hwtracing/coresight/coresight-priv.h   |  3 ++-
>  include/linux/coresight.h                      | 14 ++++++++------
>  4 files changed, 31 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
> index 206f17826399..7b485fc2ed19 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -388,6 +388,7 @@ int coresight_enable_source(struct coresight_device *csdev, enum cs_mode mode,
>  				return ret;
>  		}
>  		csdev->enable = true;
> +		csdev->mode = mode;
>  	}
>  
>  	atomic_inc(&csdev->refcnt);
> @@ -446,18 +447,26 @@ static void coresight_disable_helpers(struct coresight_device *csdev)
>   *  the device if there are no users left.
>   *
>   *  @csdev: The coresight device to disable
> + *  @mode: How the coresight device is being used, perf mode or sysfs mode.
>   *  @data: Opaque data to pass on to the disable function of the source device.
>   *         For example in perf mode this is a pointer to the struct perf_event.
>   *
>   *  Returns true if the device has been disabled.
>   */
> -bool coresight_disable_source(struct coresight_device *csdev, void *data)
> +bool coresight_disable_source(struct coresight_device *csdev, enum cs_mode mode,
> +			      void *data)
>  {
> +	if (csdev->mode && csdev->mode != mode) {

Can you do "csdev->mode != DISABLED" instead of just the implicit check
for non zero. Although probably only "csdev->mode != mode" is enough
unless you expect coresight_disable_source() to be called with mode =
DISABLED.

> +		dev_err(&csdev->dev, "Someone is already using the tracer\n");
> +		return false;
> +	}
> +
>  	if (atomic_dec_return(&csdev->refcnt) == 0) {
>  		if (source_ops(csdev)->disable)
>  			source_ops(csdev)->disable(csdev, data);
>  		coresight_disable_helpers(csdev);
>  		csdev->enable = false;
> +		csdev->mode = CS_MODE_DISABLED;
>  	}
>  	return !csdev->enable;
>  }
> @@ -1117,6 +1126,11 @@ int coresight_enable(struct coresight_device *csdev)
>  	if (ret)
>  		goto out;
>  
> +	if (csdev->mode == CS_MODE_PERF) {
> +		ret = -EBUSY;
> +		goto out;
> +	}
> +

I'm struggling to understand why a reference count wouldn't work, rather
than blocking on whether there is a Perf session active or not. Because
now you can do enable via sysfs first, then open the Perf session, but
not the other way around.

Is there a reason for the behavior to not be symmetrical?

And refcounts are already used in the code for the individual devices.

Thanks
James




More information about the linux-arm-kernel mailing list