[PATCH 1/2] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

Tue Oct 20 12:10:12 EDT 2020

On 2020-10-14 21:29, Sai Prakash Ranjan wrote:
> On 2020-10-14 18:46, Suzuki K Poulose wrote:
>> On 10/14/2020 10:36 AM, Sai Prakash Ranjan wrote:
>>> On 2020-10-13 22:05, Suzuki K Poulose wrote:
>>>> On 10/07/2020 02:00 PM, Sai Prakash Ranjan wrote:
>>>>> There was a report of NULL pointer dereference in ETF enable
>>>>> path for perf CS mode with PID monitoring. It is almost 100%
>>>>> reproducible when the process to monitor is something very
>>>>> active such as chrome and with ETF as the sink and not ETR.
>>>>> Currently in a bid to find the pid, the owner is dereferenced
>>>>> via task_pid_nr() call in tmc_enable_etf_sink_perf() and with
>>>>> owner being NULL, we get a NULL pointer dereference.
>>>>> 
>>>>> Looking at the ETR and other places in the kernel, ETF and the
>>>>> ETB are the only places trying to dereference the task(owner)
>>>>> in tmc_enable_etf_sink_perf() which is also called from the
>>>>> sched_in path as in the call trace. Owner(task) is NULL even
>>>>> in the case of ETR in tmc_enable_etr_sink_perf(), but since we
>>>>> cache the PID in alloc_buffer() callback and it is done as part
>>>>> of etm_setup_aux() when allocating buffer for ETR sink, we never
>>>>> dereference this NULL pointer and we are safe. So lets do the
>>>> 
>>>> The patch is necessary to fix some of the issues. But I feel it is
>>>> not complete. Why is it safe earlier and not later ? I believe we 
>>>> are
>>>> simply reducing the chances of hitting the issue, by doing this 
>>>> earlier than
>>>> later. I would say we better fix all instances to make sure that the
>>>> event->owner is valid. (e.g, I can see that the for kernel events
>>>> event->owner == -1 ?)
>>>> 
>>>> struct task_struct *tsk = READ_ONCE(event->owner);
>>>> 
>>>> if (!tsk || is_kernel_event(event))
>>>>    /* skip ? */
>>>> 
>>> 
>>> Looking at it some more, is_kernel_event() is not exposed
>>> outside events core and probably for good reason. Why do
>>> we need to check for this and not just tsk?
>> 
>> Because the event->owner could be :
>> 
>>  = NULL
>>  = -1UL  // kernel event
>>  = valid.
>> 
> 
> Yes I understood that part, but here we were trying to
> fix the NULL pointer dereference right and hence the
> question as to why we need to check for kernel events?
> I am no expert in perf but I don't see anywhere in the
> kernel checking for is_kernel_event(), so I am a bit
> skeptical if exporting that is actually right or not.
> 

I have stress tested with the original patch many times
now, i.e., without a check for event->owner and is_kernel_event()
and didn't observe any crash. Plus on ETR where this was already
done, no crashes were reported till date and with ETF, the issue
was quickly reproducible, so I am fairly confident that this
doesn't just delay the original issue but actually fixes
it. I will run an overnight test again to confirm this.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation