[PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context
Peter Zijlstra
peterz at infradead.org
Fri Aug 15 04:39:51 PDT 2025
On Wed, Aug 06, 2025 at 12:56:31PM -0700, Sean Christopherson wrote:
> Add arch hooks to the mediated vPMU load/put APIs, and use the hooks to
> switch PMIs to the dedicated mediated PMU IRQ vector on load, and back to
> perf's standard NMI when the guest context is put. I.e. route PMIs to
> PERF_GUEST_MEDIATED_PMI_VECTOR when the guest context is active, and to
> NMIs while the host context is active.
>
> While running with guest context loaded, ignore all NMIs (in perf). Any
> NMI that arrives while the LVTPC points at the mediated PMU IRQ vector
> can't possibly be due to a host perf event.
>
> Signed-off-by: Xiong Zhang <xiong.y.zhang at linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang at linux.intel.com>
> Signed-off-by: Mingwei Zhang <mizhang at google.com>
> [sean: use arch hook instead of per-PMU callback]
> Signed-off-by: Sean Christopherson <seanjc at google.com>
> ---
> arch/x86/events/core.c | 27 +++++++++++++++++++++++++++
> include/linux/perf_event.h | 3 +++
> kernel/events/core.c | 4 ++++
> 3 files changed, 34 insertions(+)
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 7610f26dfbd9..9b0525b252f1 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -55,6 +55,8 @@ DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
> .pmu = &pmu,
> };
>
> +static DEFINE_PER_CPU(bool, x86_guest_ctx_loaded);
> +
> DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
> DEFINE_STATIC_KEY_FALSE(rdpmc_always_available_key);
> DEFINE_STATIC_KEY_FALSE(perf_is_hybrid);
> @@ -1756,6 +1758,16 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
> u64 finish_clock;
> int ret;
>
> + /*
> + * Ignore all NMIs when a guest's mediated PMU context is loaded. Any
> + * such NMI can't be due to a PMI as the CPU's LVTPC is switched to/from
> + * the dedicated mediated PMI IRQ vector while host events are quiesced.
> + * Attempting to handle a PMI while the guest's context is loaded will
> + * generate false positives and clobber guest state.
> + */
> + if (this_cpu_read(x86_guest_ctx_loaded))
> + return NMI_DONE;
> +
> /*
> * All PMUs/events that share this PMI handler should make sure to
> * increment active_events for their events.
> @@ -2727,6 +2739,21 @@ static struct pmu pmu = {
> .filter = x86_pmu_filter,
> };
>
> +void arch_perf_load_guest_context(unsigned long data)
> +{
> + u32 masked = data & APIC_LVT_MASKED;
> +
> + apic_write(APIC_LVTPC,
> + APIC_DM_FIXED | PERF_GUEST_MEDIATED_PMI_VECTOR | masked);
> + this_cpu_write(x86_guest_ctx_loaded, true);
> +}
> +
> +void arch_perf_put_guest_context(void)
> +{
> + this_cpu_write(x86_guest_ctx_loaded, false);
> + apic_write(APIC_LVTPC, APIC_DM_NMI);
> +}
> +
> void arch_perf_update_userpage(struct perf_event *event,
> struct perf_event_mmap_page *userpg, u64 now)
> {
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0c529fbd97e6..3a9bd9c4c90e 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1846,6 +1846,9 @@ static inline unsigned long perf_arch_guest_misc_flags(struct pt_regs *regs)
> # define perf_arch_guest_misc_flags(regs) perf_arch_guest_misc_flags(regs)
> #endif
>
> +extern void arch_perf_load_guest_context(unsigned long data);
> +extern void arch_perf_put_guest_context(void);
> +
> static inline bool needs_branch_stack(struct perf_event *event)
> {
> return event->attr.branch_sample_type != 0;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index e1df3c3bfc0d..ad22b182762e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6408,6 +6408,8 @@ void perf_load_guest_context(unsigned long data)
> task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
> }
>
> + arch_perf_load_guest_context(data);
So I still don't understand why this ever needs to reach the generic
code. x86 pmu driver and x86 kvm can surely sort this out inside of x86,
no?
More information about the kvm-riscv
mailing list