[PATCH 00/15] perf: KVM: Fix, optimize, and clean up callbacks
Like Xu
like.xu.linux at gmail.com
Thu Aug 26 23:52:25 PDT 2021
+ STATIC BRANCH/CALL friends.
On 27/8/2021 8:57 am, Sean Christopherson wrote:
> This started out as a small series[1] to fix a KVM bug related to Intel PT
> interrupt handling and snowballed horribly.
>
> The main problem being addressed is that the perf_guest_cbs are shared by
> all CPUs, can be nullified by KVM during module unload, and are not
> protected against concurrent access from NMI context.
Shouldn't this be a generic issue of the static_call() usage ?
At the beginning, we set up the static entry assuming perf_guest_cbs != NULL:
if (perf_guest_cbs && perf_guest_cbs->handle_intel_pt_intr) {
static_call_update(x86_guest_handle_intel_pt_intr,
perf_guest_cbs->handle_intel_pt_intr);
}
and then we unset the perf_guest_cbs and do the static function call like this:
DECLARE_STATIC_CALL(x86_guest_handle_intel_pt_intr,
*(perf_guest_cbs->handle_intel_pt_intr));
static int handle_pmi_common(struct pt_regs *regs, u64 status)
{
...
if (!static_call(x86_guest_handle_intel_pt_intr)())
intel_pt_interrupt();
...
}
Can we make static_call() back to the original "(void *)&__static_call_return0"
in this case ?
>
> The bug has escaped notice because all dereferences of perf_guest_cbs
> follow the same "perf_guest_cbs && perf_guest_cbs->is_in_guest()" pattern,
> and AFAICT the compiler never reloads perf_guest_cbs in this sequence.
> The compiler does reload perf_guest_cbs for any future dereferences, but
> the ->is_in_guest() guard all but guarantees the PMI handler will win the
> race, e.g. to nullify perf_guest_cbs, KVM has to completely exit the guest
> and teardown down all VMs before it can be unloaded.
>
> But with help, e.g. READ_ONCE(perf_guest_cbs), unloading kvm_intel can
> trigger a NULL pointer derference (see below). Manual intervention aside,
> the bug is a bit of a time bomb, e.g. my patch 3 from the original PT
> handling series would have omitted the ->is_in_guest() guard.
>
> This series fixes the problem by making the callbacks per-CPU, and
> registering/unregistering the callbacks only with preemption disabled
> (except for the Xen case, which doesn't unregister).
>
> This approach also allows for several nice cleanups in this series.
> KVM x86 and arm64 can share callbacks, KVM x86 drops its somewhat
> redundant current_vcpu, and the retpoline that is currently hit when KVM
> is loaded (due to always checking ->is_in_guest()) goes away (it's still
> there when running as Xen Dom0).
>
> Changing to per-CPU callbacks also provides a good excuse to excise
> copy+paste code from architectures that can't possibly have guest
> callbacks.
>
> This series conflicts horribly with a proposed patch[2] to use static
> calls for perf_guest_cbs. But that patch is broken as it completely
> fails to handle unregister, and it's not clear to me whether or not
> it can correctly handle unregister without fixing the underlying race
> (I don't know enough about the code patching for static calls).
>
> This tweak
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 1eb45139fcc6..202e5ad97f82 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2954,7 +2954,7 @@ unsigned long perf_misc_flags(struct pt_regs *regs)
> {
> int misc = 0;
>
> - if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> + if (READ_ONCE(perf_guest_cbs) && READ_ONCE(perf_guest_cbs)->is_in_guest()) {
> if (perf_guest_cbs->is_user_mode())
> misc |= PERF_RECORD_MISC_GUEST_USER;
> else
>
> while spamming module load/unload leads to:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP
> CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> RIP: 0010:perf_misc_flags+0x1c/0x70
> Call Trace:
> perf_prepare_sample+0x53/0x6b0
> perf_event_output_forward+0x67/0x160
> __perf_event_overflow+0x52/0xf0
> handle_pmi_common+0x207/0x300
> intel_pmu_handle_irq+0xcf/0x410
> perf_event_nmi_handler+0x28/0x50
> nmi_handle+0xc7/0x260
> default_do_nmi+0x6b/0x170
> exc_nmi+0x103/0x130
> asm_exc_nmi+0x76/0xbf
>
> [1] https://lkml.kernel.org/r/20210823193709.55886-1-seanjc@google.com
> [2] https://lkml.kernel.org/r/20210806133802.3528-2-lingshan.zhu@intel.com
>
> Sean Christopherson (15):
> KVM: x86: Register perf callbacks after calling vendor's
> hardware_setup()
> KVM: x86: Register Processor Trace interrupt hook iff PT enabled in
> guest
> perf: Stop pretending that perf can handle multiple guest callbacks
> perf: Force architectures to opt-in to guest callbacks
> perf: Track guest callbacks on a per-CPU basis
> KVM: x86: Register perf callbacks only when actively handling
> interrupt
> KVM: Use dedicated flag to track if KVM is handling an NMI from guest
> KVM: x86: Drop current_vcpu in favor of kvm_running_vcpu
> KVM: arm64: Register/unregister perf callbacks at vcpu load/put
> KVM: Move x86's perf guest info callbacks to generic KVM
> KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c
> KVM: arm64: Convert to the generic perf callbacks
> KVM: arm64: Drop perf.c and fold its tiny bit of code into pmu.c
> perf: Disallow bulk unregistering of guest callbacks and do cleanup
> perf: KVM: Indicate "in guest" via NULL ->is_in_guest callback
>
> arch/arm/kernel/perf_callchain.c | 28 ++------------
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/kvm_host.h | 8 +++-
> arch/arm64/kernel/perf_callchain.c | 18 ++++++---
> arch/arm64/kvm/Makefile | 2 +-
> arch/arm64/kvm/arm.c | 13 ++++++-
> arch/arm64/kvm/perf.c | 62 ------------------------------
> arch/arm64/kvm/pmu.c | 8 ++++
> arch/csky/kernel/perf_callchain.c | 10 -----
> arch/nds32/kernel/perf_event_cpu.c | 29 ++------------
> arch/riscv/kernel/perf_callchain.c | 10 -----
> arch/x86/Kconfig | 1 +
> arch/x86/events/core.c | 17 +++++---
> arch/x86/events/intel/core.c | 7 ++--
> arch/x86/include/asm/kvm_host.h | 4 +-
> arch/x86/kvm/pmu.c | 2 +-
> arch/x86/kvm/pmu.h | 1 +
> arch/x86/kvm/svm/svm.c | 2 +-
> arch/x86/kvm/vmx/vmx.c | 25 ++++++++++--
> arch/x86/kvm/x86.c | 54 +++-----------------------
> arch/x86/kvm/x86.h | 12 +++---
> arch/x86/xen/pmu.c | 2 +-
> include/kvm/arm_pmu.h | 1 +
> include/linux/kvm_host.h | 12 ++++++
> include/linux/perf_event.h | 33 ++++++++++++----
> init/Kconfig | 3 ++
> kernel/events/core.c | 28 ++++++++------
> virt/kvm/kvm_main.c | 42 ++++++++++++++++++++
> 28 files changed, 204 insertions(+), 231 deletions(-)
> delete mode 100644 arch/arm64/kvm/perf.c
>
More information about the linux-riscv
mailing list