Fwd: Bug: Potential KCOV Race Condition in __sanitizer_cov_trace_pc Leading to Crash at kcov.c:217

Tue Jan 14 01:38:57 PST 2025

-ENOCONTENT ?

On Sun, Jan 12, 2025 at 04:51:01PM +0800, Kun Hu wrote:
> 
> 
> > 下面是被转发的邮件：
> > 
> > 发件人: Kun Hu <huk23 at m.fudan.edu.cn>
> > 主题: 回复：Bug: Potential KCOV Race Condition in __sanitizer_cov_trace_pc Leading to Crash at kcov.c:217
> > 日期: 2025年1月12日 GMT+8 16:35:31
> > 收件人: vgupta at synopsys.com, Eugeniy.Paltsev at synopsys.com
> > 抄送: andreyknvl at gmail.com, akpm at linux-foundation.org, elver at google.com, arnd at arndb.de, nogikh at google.com, kasan-dev at googlegroups.com, linux-kernel at vger.kernel.org, "jjtan24 at m.fudan.edu.cn" <jjtan24 at m.fudan.edu.cn>, Dmitry Vyukov <dvyukov at google.com>
> > 
> > 
> > 
> >> 2025年1月10日 20:13，Dmitry Vyukov <dvyukov at google.com> 写道：
> >> 
> >> On Fri, 10 Jan 2025 at 09:14, Kun Hu <huk23 at m.fudan.edu.cn> wrote:
> >>>>> HEAD commit: dbfac60febfa806abb2d384cb6441e77335d2799
> >>>>> git tree: upstream
> >>>>> Console output: https://drive.google.com/file/d/1rmVTkBzuTt0xMUS-KPzm9OafMLZVOAHU/view?usp=sharing
> >>>>> Kernel config: https://drive.google.com/file/d/1m1mk_YusR-tyusNHFuRbzdj8KUzhkeHC/view?usp=sharing
> >>>>> C reproducer: /
> >>>>> Syzlang reproducer: /
> >>>>> 
> >>>>> The crash in __sanitizer_cov_trace_pc at kernel/kcov.c:217 seems to be related to the handling of KCOV instrumentation when running in a preemption or IRQ-sensitive context. Specifically, the code might allow potential recursive invocations of __sanitizer_cov_trace_pc during early interrupt handling, which could lead to data races or inconsistent updates to the coverage area (kcov_area). It remains unclear whether this is a KCOV-specific issue or a rare edge case exposed by fuzzing.
> >>>> 
> >>>> Hi Kun,
> >>>> 
> >>>> How have you inferred this from the kernel oops?
> >>>> I only see a stall that may have just happened to be caught inside of
> >>>> __sanitizer_cov_trace_pc function since it's executed often in an
> >>>> instrumented kernel.
> >>>> 
> >>>> Note: on syzbot we don't report stalls on instances that have
> >>>> perf_event_open enabled, since perf have known bugs that lead to stall
> >>>> all over the kernel.
> >>> 
> >>> Hi Dmitry,
> >>> 
> >>> Please allow me to ask for your advice:
> >>> 
> >>> We get the new c and syzlang reproducer  for multiple rounds of reproducing. Indeed, the location of this issue has varied (BUG: soft lockup in tmigr_handle_remote in ./kernel/time/timer_migration.c). The crash log, along with the C and Syzlang reproducer are provided below:
> >>> 
> >>> Crash log: https://drive.google.com/file/d/16YDP6bU3Ga8OI1l7hsNFG4EdvjxuBz8d/view?usp=sharing
> >>> C reproducer: https://drive.google.com/file/d/1BHDc6XdXsat07yb94h6VWJ-jIIKhwPfn/view?usp=sharing
> >>> Syzlang reproducer: https://drive.google.com/file/d/1qo1qfr0KNbyIK909ddAo6uzKnrDPdGyV/view?usp=sharing
> >>> 
> >>> Should I report the issue to the maintainer responsible for “timer_migration.c”?
> >> 
> >> If it shows stalls in 2 locations, I assume it can show stalls all
> >> over the kernel.
> >> 
> >> The only thing the reproducer is doing is perf_event_open, so I would
> >> assume the issue is related to perf.
> > 
> > Thanks to Dmitry,
> > 
> > Hi perf maintainers,
> > 
> > We reproduced the issue for multiple rounds. 
> > 
> > Does the frequent occurrence of perf_callchain_kernel in the call chain indicate a possible problem with the call chain logging or processing logic for performance events?
> > 
> > We lack the relevant technical background, could you help us to check the cause of the issue?
> > 
> > ————
> > Thanks,
> > Kun Hu.
> > 
>