[PATCH v3 0/8] Generic IPI sending tracepoint
Palmer Dabbelt
palmer at dabbelt.com
Tue Dec 13 08:18:13 PST 2022
On Fri, 02 Dec 2022 07:58:09 PST (-0800), vschneid at redhat.com wrote:
> Background
> ==========
>
> Detecting IPI *reception* is relatively easy, e.g. using
> trace_irq_handler_{entry,exit} or even just function-trace
> flush_smp_call_function_queue() for SMP calls.
>
> Figuring out their *origin*, is trickier as there is no generic tracepoint tied
> to e.g. smp_call_function():
>
> o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them
> (cf. trace_call_function{_single}_entry()).
> o arm/arm64 do have trace_ipi_raise(), which gives us the target cpus but also a
> mostly useless string (smp_calls will all be "Function call interrupts").
> o Other architectures don't seem to have any IPI-sending related tracepoint.
>
> I believe one reason those tracepoints used by arm/arm64 ended up as they were
> is because these archs used to handle IPIs differently from regular interrupts
> (the IRQ driver would directly invoke an IPI-handling routine), which meant they
> never showed up in trace_irq_handler_{entry, exit}. The trace_ipi_{entry,exit}
> tracepoints gave a way to trace IPI reception but those have become redundant as
> of:
>
> 56afcd3dbd19 ("ARM: Allow IPIs to be handled as normal interrupts")
> d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts")
>
> which gave IPIs a "proper" handler function used through
> generic_handle_domain_irq(), which makes them show up via
> trace_irq_handler_{entry, exit}.
>
> Changing stuff up
> =================
>
> Per the above, it would make sense to reshuffle trace_ipi_raise() and move it
> into generic code. This also came up during Daniel's talk on Osnoise at the CPU
> isolation MC of LPC 2022 [1].
>
> Now, to be useful, such a tracepoint needs to export:
> o targeted CPU(s)
> o calling context
>
> The only way to get the calling context with trace_ipi_raise() is to trigger a
> stack dump, e.g. $(trace-cmd -e ipi* -T echo 42).
>
> This is instead introducing a new tracepoint which exports the relevant context
> (callsite, and requested callback for when the callsite isn't helpful), and is
> usable by all architectures as it sits in generic code.
>
> Another thing worth mentioning is that depending on the callsite, the _RET_IP_
> fed to the tracepoint is not always useful - generic_exec_single() doesn't tell
> you much about the actual callback being sent via IPI, which is why the new
> tracepoint also has a @callback argument.
>
> Patches
> =======
>
> o Patch 1 is included for convenience and will be merged independently. FYI I
> have libtraceevent patches [2] to improve the
> pretty-printing of cpumasks using the new type, which look like:
> <...>-3322 [021] 560.402583: ipi_send_cpumask: cpumask=14,17,21 callsite=on_each_cpu_cond_mask+0x40 callback=flush_tlb_func+0x0
> <...>-187 [010] 562.590584: ipi_send_cpumask: cpumask=0-23 callsite=on_each_cpu_cond_mask+0x40 callback=do_sync_core+0x0
>
> o Patches 2-6 spread out the tracepoint across relevant sites.
> Patch 6 ends up sprinkling lots of #include <trace/events/ipi.h> which I'm not
> the biggest fan of, but is the least horrible solution I've been able to come
> up with so far.
>
> o Patch 8 is trying to be smart about tracing the callback associated with the
> IPI.
>
> This results in having IPI trace events for:
>
> o smp_call_function*()
> o smp_send_reschedule()
> o irq_work_queue*()
> o standalone uses of __smp_call_single_queue()
>
> This is incomplete, just looking at arm64 there's more IPI types that aren't
> covered:
>
> IPI_CPU_STOP,
> IPI_CPU_CRASH_STOP,
> IPI_TIMER,
> IPI_WAKEUP,
>
> ... But it feels like a good starting point.
>
> Links
> =====
>
> [1]: https://youtu.be/5gT57y4OzBM?t=14234
> [2]: https://lore.kernel.org/all/20221116144154.3662923-1-vschneid@redhat.com/
>
> Revisions
> =========
>
> v2 -> v3
> ++++++++
>
> o Dropped the generic export of smp_send_reschedule(), turned it into a macro
> and a bunch of imports
> o Dropped the send_call_function_single_ipi() macro madness, split it into sched
> and smp bits using some of Peter's suggestions
>
> v1 -> v2
> ++++++++
>
> o Ditched single-CPU tracepoint
> o Changed tracepoint signature to include callback
> o Changed tracepoint callsite field to void *; the parameter is still UL to save
> up on casts due to using _RET_IP_.
> o Fixed linking failures due to not exporting smp_send_reschedule()
>
> Steven Rostedt (Google) (1):
> tracing: Add __cpumask to denote a trace event field that is a
> cpumask_t
>
> Valentin Schneider (7):
> trace: Add trace_ipi_send_cpumask()
> sched, smp: Trace IPIs sent via send_call_function_single_ipi()
> smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
> irq_work: Trace self-IPIs sent via arch_irq_work_raise()
> treewide: Trace IPIs sent via smp_send_reschedule()
> smp: reword smp call IPI comment
> sched, smp: Trace smp callback causing an IPI
>
> arch/alpha/kernel/smp.c | 2 +-
> arch/arc/kernel/smp.c | 2 +-
> arch/arm/kernel/smp.c | 5 +-
> arch/arm/mach-actions/platsmp.c | 2 +
> arch/arm64/kernel/smp.c | 3 +-
> arch/csky/kernel/smp.c | 2 +-
> arch/hexagon/kernel/smp.c | 2 +-
> arch/ia64/kernel/smp.c | 4 +-
> arch/loongarch/include/asm/smp.h | 2 +-
> arch/mips/include/asm/smp.h | 2 +-
> arch/mips/kernel/rtlx-cmp.c | 2 +
> arch/openrisc/kernel/smp.c | 2 +-
> arch/parisc/kernel/smp.c | 4 +-
> arch/powerpc/kernel/smp.c | 6 +-
> arch/powerpc/kvm/book3s_hv.c | 3 +
> arch/powerpc/platforms/powernv/subcore.c | 2 +
> arch/riscv/kernel/smp.c | 4 +-
> arch/s390/kernel/smp.c | 2 +-
> arch/sh/kernel/smp.c | 2 +-
> arch/sparc/kernel/smp_32.c | 2 +-
> arch/sparc/kernel/smp_64.c | 2 +-
> arch/x86/include/asm/smp.h | 2 +-
> arch/x86/kvm/svm/svm.c | 4 +
> arch/x86/kvm/x86.c | 2 +
> arch/xtensa/kernel/smp.c | 2 +-
> include/linux/smp.h | 8 +-
> include/trace/bpf_probe.h | 6 ++
> include/trace/events/ipi.h | 22 ++++++
> include/trace/perf.h | 6 ++
> include/trace/stages/stage1_struct_define.h | 6 ++
> include/trace/stages/stage2_data_offsets.h | 6 ++
> include/trace/stages/stage3_trace_output.h | 6 ++
> include/trace/stages/stage4_event_fields.h | 6 ++
> include/trace/stages/stage5_get_offsets.h | 6 ++
> include/trace/stages/stage6_event_callback.h | 20 +++++
> include/trace/stages/stage7_class_define.h | 2 +
> kernel/irq_work.c | 14 +++-
> kernel/sched/core.c | 19 +++--
> kernel/sched/smp.h | 2 +-
> kernel/smp.c | 78 ++++++++++++++++----
> samples/trace_events/trace-events-sample.c | 2 +-
> samples/trace_events/trace-events-sample.h | 34 +++++++--
> virt/kvm/kvm_main.c | 1 +
> 43 files changed, 250 insertions(+), 61 deletions(-)
Acked-by: Palmer Dabbelt <palmer at rivosinc.com> # riscv
More information about the linux-riscv
mailing list