[PATCH] coresight: Add mode check when enable/disable csdev source

Suzuki K Poulose suzuki.poulose at arm.com
Mon Oct 2 03:32:30 PDT 2023


Hello

On 26/09/2023 12:43, hejunhao wrote:
> Hi, Suzuki
> 
> 
> On 2023/9/22 17:20, Suzuki K Poulose wrote:
>> On 21/09/2023 16:36, James Clark wrote:
>>>
>>>
>>> On 21/09/2023 14:29, Junhao He wrote:
>>>> The commmit [1] replace "source_ops(csdev)::enable()" with
>>>> coresight_enable_source(). After this patch, when enable ETM through
>>>> perf mode, the csdev::enable will be set to true. Then if we to disable
>>>> the ETM by sysfs mode at the time, the resource will be released
>>>> incorrectly, kernel will report the following call trace.
>>>>
>>>> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
>>>> echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink
>>>> echo 1 > /sys/bus/coresight/devices/etm0/enable_source
>>>> echo 0 > /sys/bus/coresight/devices/etm0/enable_source
>>>> Unable to handle kernel NULL pointer dereference at virtual
>>>> address 00000000000001d0
>>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>> ...
>>>> Call trace:
>>>>   etm4_disable+0x54/0x150 [coresight_etm4x]
>>>>   coresight_disable_source+0x6c/0x98 [coresight]
>>>>   coresight_disable+0x74/0x1c0 [coresight]
>>>>   enable_source_store+0x88/0xa0 [coresight]
>>>>   dev_attr_store+0x20/0x40
>>>>   sysfs_kf_write+0x4c/0x68
>>>>   kernfs_fop_write_iter+0x120/0x1b8
>>>>   vfs_write+0x2dc/0x3b0
>>>>   ksys_write+0x70/0x108
>>>>   __arm64_sys_write+0x24/0x38
>>>>   invoke_syscall+0x50/0x128
>>>>   el0_svc_common.constprop.0+0x104/0x130
>>>>   do_el0_svc+0x40/0xb8
>>>>   el0_svc+0x2c/0xb8
>>>>   el0t_64_sync_handler+0xc0/0xc8
>>>>   el0t_64_sync+0x1a4/0x1a8
>>>> Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5)
>>>> ---[ end trace 0000000000000000 ]---
>>>>
>>>> The csdev::enable specify the device is currently part of an active
>>>> path, when enabling source via perf mode, csdev::enable also should
>>>> be set to true. We can add mode check in etm4_disable() to fix that,
>>>> if it's done, the sysfs cannot report busy when sysfs to enable ETM
>>>> that has been enabled by perf.
>>>>
>>>> Struct coresight_device add member of mode to check device busy in
>>>> coresight_(enable/disable)_source function to fixes handle kernel
>>>> NULL pointer, and sysfs report busy if perf session is already using
>>>> the ETM.
>>>> By the way, inperhaps another cleanup patch may be upload to remove
>>>> the etmv4_drvdata::mode tmc_drvdata::mode or others.
>>>>
>>>
>>> Hi Junhao,
>>>
>>> I definitely think we should do the cleanup at the same time as this
>>> change. Otherwise we risk leaving in the other modes which are
>>> duplicates of this and could easily cause confusion.
>>>
>>> Although I saw there are a couple of modes that are not enum cs_mode,
>>> but a different kind of mode. And it's not always obvious because enum
>>> cs_mode is sometimes stored as an int...
>>>
>>>> Test:
>>>> perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 &
>>>> cat /sys/bus/coresight/devices/etm2/enable_source
>>>> 1
>>>> echo 1 > /sys/bus/coresight/devices/etm2/enable_source
>>>> -bash: echo: write error: Device or resource busy
>>>> echo 0 > /sys/bus/coresight/devices/etm2/enable_source
>>>> coresight etm2: Someone is already using the tracer
>>>> cat /sys/bus/coresight/devices/etm2/enable_source
>>>> 1
>>>>
>>>
>>> I tested the fix and it works, although I have a few comments below on
>>> whether we should do something else instead.
>>>
>>>> [1] "coresight: Enable and disable helper devices adjacent to the 
>>>> path"> Fixes: 6148652807ba ("coresight: Enable and disable helper 
>>>> devices
>>> adjacent to the path")
>>>
>>> What was the original behavior before that change? Maybe it didn't have
>>> exactly the same error but was it still possible to enable and disable a
>>> sink after starting a Perf session? Seems like on the disable the same
>>> thing would have happened or something else bad?
>>
>> Before that, the perf was using the source_ops->enable/disable directly.
>> And thus the csdev->enable was not set, forcing the sysfs mode to try
>> enabling it via the source_ops->enable, which would reject the request.
>>
>> But with the switch to using enable_source, the sysfs mode request
>> bails out early assuming that the "csdev->enable == true" set by the
>> perf mode, to indicate the sysfs mode and goes ahead.
>>
>> Ideally, we should rely on the backend driver to do the check ?
> 
> As i said above. We can add mode check in etm4_disable() to fix that.
> But I didn't do this because I wanted sysfs to report a busy log. If the
> logs are unimportant, we do not need to add csdev->mode and can
> check the mode in etm4_disable().
> 
>>
>> Does the following patch fix problems for you ?
>>
> 
> I tested on my platform and the problem still exists.
> If the ETM has been used by the perf session, coresight_disable_source() 
> still
> responds to the disable csdev by sysfs.
> 
>    # perf record -e /cs_etm/@ultra_smb0/ -C 22 -- sleep 30 &
>    [2] 11157
>    # echo 1 > /sys/bus/coresight/devices/etm22/enable_source
>    # echo 0 > /sys/bus/coresight/devices/etm22/enable_source
>    [ 5308.708990] Unable to handle kernel NULL pointer dereference at 
> virtual address 00000000000001d0
>    [ 5308.834767] CPU: 2 PID: 11058 Comm: bash Kdump: loaded Tainted: 
> G           O       6.5.0-rc4+ #1
>    [ 5308.843894] Hardware name: Huawei TaiShan 200 (Model 
> 2280)/BC82AMDD, BIOS 2280-V2 CS V5.B221.01 12/09/2021
>    [ 5308.854030] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS 
> BTYPE=--)
>    [ 5308.861227] pc : etm4_disable+0x54/0x150 [coresight_etm4x]
>    [ 5308.866956] lr : coresight_disable_source+0x5c/0xb8 [coresight]
> 
> Best regards,
> Junhao.
> 
>>
>>
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-core.c 
>> b/drivers/hwtracing/coresight/coresight-core.c
>> index 9fabe00a40d6..10e3609d8290 100644
>> --- a/drivers/hwtracing/coresight/coresight-core.c
>> +++ b/drivers/hwtracing/coresight/coresight-core.c
>> @@ -381,15 +381,12 @@ int coresight_enable_source(struct 
>> coresight_device *csdev, enum cs_mode mode,
>>  {
>>      int ret;
>>
>> -    if (!csdev->enable) {
>> -        if (source_ops(csdev)->enable) {
>> -            ret = source_ops(csdev)->enable(csdev, data, mode);
>> -            if (ret)
>> -                return ret;
>> -        }
>> +    if (source_ops(csdev)->enable) {
>> +        ret = source_ops(csdev)->enable(csdev, data, mode);
>> +        if (ret)
>> +            return ret;
>>          csdev->enable = true;
>>      }
>> -
>>      atomic_inc(&csdev->refcnt);
>>
>>      return 0;
>> @@ -453,7 +450,7 @@ static void coresight_disable_helpers(struct 
>> coresight_device *csdev)
>>   */
>>  bool coresight_disable_source(struct coresight_device *csdev, void 
>> *data)
>>  {
>> -    if (atomic_dec_return(&csdev->refcnt) == 0) {
>> +    if (csdev->enable && atomic_dec_return(&csdev->refcnt) == 0) {
>>          if (source_ops(csdev)->disable)
>>              source_ops(csdev)->disable(csdev, data);
>>          coresight_disable_helpers(csdev);
>> @@ -1202,7 +1199,7 @@ void coresight_disable(struct coresight_device 
>> *csdev)
>>      if (ret)
>>          goto out;
>>
>> -    if (!csdev->enable || !coresight_disable_source(csdev, NULL))
>> +    if (!coresight_disable_source(csdev, NULL))
>>          goto out;
>>
>>      switch (csdev->subtype.source_subtype) {
>>
>>

I am concerned about the "mode" changes without any synchronization.
i.e. earlier, coresight_enable_source() was called under coresight_mutex
from sysfs mode. But the perf code doesn't use any of that. So, it
is still possible to race and overwrite the "mode".

I think we need to take a closer look at the locking to make
sure this is clear.

Suzuki




More information about the linux-arm-kernel mailing list