[PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

Fri Oct 30 03:59:56 EDT 2020

Hello guys,

On 2020-10-24 02:07, Mathieu Poirier wrote:
> On Fri, Oct 23, 2020 at 03:44:16PM +0200, Peter Zijlstra wrote:
>> On Fri, Oct 23, 2020 at 02:29:54PM +0100, Suzuki Poulose wrote:
>> > On 10/23/20 2:16 PM, Peter Zijlstra wrote:
>> > > On Fri, Oct 23, 2020 at 01:56:47PM +0100, Suzuki Poulose wrote:
>> 
>> > > > That way another session could use the same sink if it is free. i.e
>> > > >
>> > > > perf record -e cs_etm/@sink0/u --per-thread app1
>> > > >
>> > > > and
>> > > >
>> > > > perf record -e cs_etm/@sink0/u --per-thread app2
>> > > >
>> > > > both can work as long as the sink is not used by the other session.
>> > >
>> > > Like said above, if sink is shared between CPUs, that's going to be a
>> > > trainwreck :/ Why do you want that?
>> >
>> > That ship has sailed. That is how the current generation of systems are,
>> > unfortunately. But as I said, this is changing and there are guidelines
>> > in place to avoid these kind of topologies. With the future
>> > technologies, this will be completely gone.
>> 
>> I understand that the hardware is like that, but why do you want to
>> support this insanity in software?
>> 
>> If you only allow a single sink user (group) at the same time, your
>> problem goes away. Simply disallow the above scenario, do not allow
>> concurrent sink users if sinks are shared like this.
>> 
>> Have the perf-record of app2 above fail because the sink is in-user
>> already.
> 
> I agree with you that --per-thread scenarios are easy to deal with, but 
> to
> support cpu-wide scenarios events must share a sink (because there is 
> one event
> per CPU).  CPU-wide support can't be removed because it has been around
> for close to a couple of years and heavily used. I also think using the 
> pid of
> the process that created the events, i.e perf, is a good idea.  We just 
> need to
> agree on how to gain access to it.
> 
> In Sai's patch you objected to the following:
> 
>> +     struct task_struct *task = READ_ONCE(event->owner);
>> +
>> +     if (!task || is_kernel_event(event))
> 
> Would it be better to use task_nr_pid(current) instead of event->owner? 
>  The end
> result will be exactly the same.  There is also no need to check the 
> validity of
> @current since it is a user process.
> 

We have devices deployed where these crashes are seen consistently,
so for some immediate relief, could we atleast get some fix in this
cycle without major design overhaul which would likely take more time.
Perhaps my first patch [1] without any check for owner or
I can post a new version as Suzuki suggested [2] dropping the export
of is_kernel_event(). Then we can always work on top of it based on the
conclusion of this discussion, we will atleast not have the systems
crash in the meantime, thoughts?

[1] https://lore.kernel.org/patchwork/patch/1318098/
[2] 
https://lore.kernel.org/lkml/fa6cdf34-88a0-1050-b9ea-556d0a9438cb@arm.com/

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation