[PATCH 0/3] arm_pmu: acpi: avoid allocations in atomic context
Mark Rutland
mark.rutland at arm.com
Fri Sep 30 04:18:41 PDT 2022
This series attempts to make the arm_pmu ACPI probing code lpay nicely
with PREEMPT_RT by moving work out of atomic context.
The arm_pmu ACPI probing code tries to do a number of things in atomic
context which is generally not good, and especially problematic for
PREEMPT_RT, as reported by Valentin and Pierre:
https://lore.kernel.org/all/20210810134127.1394269-2-valentin.schneider@arm.com/
https://lore.kernel.org/linux-arm-kernel/20220912155105.1443303-1-pierre.gondois@arm.com/
We'd previously tried to bodge around this, e.g. in commits:
* 0dc1a1851af1d593 ("arm_pmu: add armpmu_alloc_atomic()")
* 167e61438da0664c ("arm_pmu: acpi: request IRQs up-front")
... but this isn't good enough for PREEMPT_RT, and as reported by Pierre
the probing code can trigger splats:
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
| in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 24, name: cpuhp/0
| preempt_count: 0, expected: 0
| RCU nest depth: 0, expected: 0
| 3 locks held by cpuhp/0/24:
| #0: ffffd8a22c8870d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun (linux/kernel/cpu.c:754)
| #1: ffffd8a22c887120 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun (linux/kernel/cpu.c:754)
| #2: ffff083e7f0d97b8 ((&c->lock)){+.+.}-{3:3}, at: ___slab_alloc (linux/mm/slub.c:2954)
| irq event stamp: 42
| hardirqs last enabled at (41): finish_task_switch (linux/./arch/arm64/include/asm/irqflags.h:35)
| hardirqs last disabled at (42): cpuhp_thread_fun (linux/kernel/cpu.c:776 (discriminator 1))
| softirqs last enabled at (0): copy_process (linux/./include/linux/lockdep.h:191)
| softirqs last disabled at (0): 0x0
| CPU: 0 PID: 24 Comm: cpuhp/0 Tainted: G W 5.19.0-rc3-rt4-custom-piegon01-rt_0 #142
| Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
| Call trace:
| dump_backtrace (linux/arch/arm64/kernel/stacktrace.c:200)
| show_stack (linux/arch/arm64/kernel/stacktrace.c:207)
| dump_stack_lvl (linux/lib/dump_stack.c:107)
| dump_stack (linux/lib/dump_stack.c:114)
| __might_resched (linux/kernel/sched/core.c:9929)
| rt_spin_lock (linux/kernel/locking/rtmutex.c:1732 (discriminator 4))
| ___slab_alloc (linux/mm/slub.c:2954)
| __slab_alloc.isra.0 (linux/mm/slub.c:3116)
| kmem_cache_alloc_trace (linux/mm/slub.c:3207)
| __armpmu_alloc (linux/./include/linux/slab.h:600)
| armpmu_alloc_atomic (linux/drivers/perf/arm_pmu.c:927)
| arm_pmu_acpi_cpu_starting (linux/drivers/perf/arm_pmu_acpi.c:204)
| cpuhp_invoke_callback (linux/kernel/cpu.c:192)
| cpuhp_thread_fun (linux/kernel/cpu.c:777 (discriminator 3))
| smpboot_thread_fn (linux/kernel/smpboot.c:164 (discriminator 3))
| kthread (linux/kernel/kthread.c:376)
| ret_from_fork (linux/arch/arm64/kernel/entry.S:868)
Thomas Gleixner suggested that we could pre-allocate structures to avoid
this issue:
https://lore.kernel.org/all/87y299oyyq.ffs@tglx/
... and Pierre implemented that:
https://lore.kernel.org/linux-arm-kernel/20220912155105.1443303-1-pierre.gondois@arm.com/
... but in practice this gets pretty hairy due to having to manage the
lifetime of those pre-allocated objects across various stages of the
probing flow.
This series reworks the code to perform all the allocation and
registration with perf at boot time, by scannign the set of online CPUs
and regsiter a PMU for each unique MIDR (which we use today to identify
distinct PMUs). This avoids the need for allocation in the hotplug
paths, and brings the ACPI probing code into line with the DT/platform
probing code.
When a CPU is late hotplugged, either:
(a) It matches an existing PMU's MIDR, and will be associated with that
PMU.
(b) It does not match an existing PMU's MIDR, and will not be
associated with a PMU (and a warning is logged to dmesg).
Aside from the warning, this matches the existing behaviour, as we
only register CPU PMUs with perf at boot time, and not for late
hotplugged CPUs.
I've tested the series in a VM, using ACPI and faked MIDR values to test
a few homogeneous and heterogeneous configurations, using the 'maxcpus'
kernel argument to test the late-hotplug behaviour:
* On a system where all CPUs have the same MIDR, late-onlining a CPU causes it
to be associated with a matching PMU:
| # ls /sys/bus/event_source/devices/
| armv8_pmuv3_0 breakpoint software tracepoint
| # cat /sys/bus/event_source/devices/armv8_pmuv3_0/cpus
| 0-7
| # echo 1 > /sys/devices/system/cpu/cpu10/online
| Detected PIPT I-cache on CPU10
| GICv3: CPU10: found redistributor a region 0:0x00000000081e0000
| GICv3: CPU10: using allocated LPI pending table @0x00000000402b0000
| CPU10: Booted secondary processor 0x000000000a [0x431f0af1]
| # ls /sys/bus/event_source/devices/
| armv8_pmuv3_0 breakpoint software tracepoint
| # cat /sys/bus/event_source/devices/armv8_pmuv3_0/cpus
| 0-7,10
* On a system where all CPUs have a unique MIDR, each of the boot-time
CPUs gets a unique PMU:
| # ls /sys/bus/event_source/devices/
| armv8_pmuv3_0 armv8_pmuv3_3 armv8_pmuv3_6 software
| armv8_pmuv3_1 armv8_pmuv3_4 armv8_pmuv3_7 tracepoint
| armv8_pmuv3_2 armv8_pmuv3_5 breakpoint
* On a system where all CPUs have a unique MIDR, late-onlining a CPU
results in that CPU not being associated with a PMU, but the CPU is
successfully onlined:
| # echo 1 > /sys/devices/system/cpu/cpu8/online
| Detected PIPT I-cache on CPU8
| GICv3: CPU8: found redistributor 8 region 0:0x00000000081a0000
| GICv3: CPU8: using allocated LPI pending table @0x0000000040290000
| Unable to associate CPU8 with a PMU
| CPU8: Booted secondary processor 0x0000000008 [0x431f0af1]
Thanks,
Mark.
Mark Rutland (3):
arm_pmu: acpi: factor out PMU<->CPU association
arm_pmu: factor out PMU matching
arm_pmu: rework ACPI probing
drivers/perf/arm_pmu.c | 17 +-----
drivers/perf/arm_pmu_acpi.c | 113 ++++++++++++++++++++---------------
include/linux/perf/arm_pmu.h | 1 -
3 files changed, 69 insertions(+), 62 deletions(-)
--
2.30.2
More information about the linux-arm-kernel
mailing list