[PATCH v4 01/17] perf: Protect perf_guest_cbs with RCU
Paolo Bonzini
pbonzini at redhat.com
Wed Nov 10 23:26:58 PST 2021
On 11/11/21 03:07, Sean Christopherson wrote:
> Protect perf_guest_cbs with RCU to fix multiple possible errors. Luckily,
> all paths that read perf_guest_cbs already require RCU protection, e.g. to
> protect the callback chains, so only the direct perf_guest_cbs touchpoints
> need to be modified.
>
> Bug #1 is a simple lack of WRITE_ONCE/READ_ONCE behavior to ensure
> perf_guest_cbs isn't reloaded between a !NULL check and a dereference.
> Fixed via the READ_ONCE() in rcu_dereference().
>
> Bug #2 is that on weakly-ordered architectures, updates to the callbacks
> themselves are not guaranteed to be visible before the pointer is made
> visible to readers. Fixed by the smp_store_release() in
> rcu_assign_pointer() when the new pointer is non-NULL.
>
> Bug #3 is that, because the callbacks are global, it's possible for
> readers to run in parallel with an unregisters, and thus a module
> implementing the callbacks can be unloaded while readers are in flight,
> resulting in a use-after-free. Fixed by a synchronize_rcu() call when
> unregistering callbacks.
>
> Bug #1 escaped notice because it's extremely unlikely a compiler will
> reload perf_guest_cbs in this sequence. perf_guest_cbs does get reloaded
> for future derefs, e.g. for ->is_user_mode(), but the ->is_in_guest()
> guard all but guarantees the consumer will win the race, e.g. to nullify
> perf_guest_cbs, KVM has to completely exit the guest and teardown down
> all VMs before KVM start its module unload / unregister sequence. This
> also makes it all but impossible to encounter bug #3.
>
> Bug #2 has not been a problem because all architectures that register
> callbacks are strongly ordered and/or have a static set of callbacks.
>
> But with help, unloading kvm_intel can trigger bug #1 e.g. wrapping
> perf_guest_cbs with READ_ONCE in perf_misc_flags() while spamming
> kvm_intel module load/unload leads to:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP
> CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> RIP: 0010:perf_misc_flags+0x1c/0x70
> Call Trace:
> perf_prepare_sample+0x53/0x6b0
> perf_event_output_forward+0x67/0x160
> __perf_event_overflow+0x52/0xf0
> handle_pmi_common+0x207/0x300
> intel_pmu_handle_irq+0xcf/0x410
> perf_event_nmi_handler+0x28/0x50
> nmi_handle+0xc7/0x260
> default_do_nmi+0x6b/0x170
> exc_nmi+0x103/0x130
> asm_exc_nmi+0x76/0xbf
>
> Fixes: 39447b386c84 ("perf: Enhance perf to allow for guest statistic collection from host")
> Cc: stable at vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc at google.com>
> ---
Reviewed-by: Paolo Bonzini <pbonzini at redhat.com>
One nit:
> EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
>
> int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
> {
> - perf_guest_cbs = NULL;
> + if (WARN_ON_ONCE(rcu_access_pointer(perf_guest_cbs) != cbs))
> + return -EINVAL;
> +
> + rcu_assign_pointer(perf_guest_cbs, NULL);
> + synchronize_rcu();
This technically could be RCU_INIT_POINTER but it's not worth a respin.
There are dozens of other occurrences, and if somebody wanted they
could use Coccinelle to fix all of them.
Paolo
More information about the linux-riscv
mailing list