arm64 torture test hotplug failures (offlining causes -EBUSY)

Marc Zyngier maz at kernel.org
Mon Jan 16 10:03:14 PST 2023


Hi Joel,

On Mon, 16 Jan 2023 17:03:31 +0000,
Joel Fernandes <joel at joelfernandes.org> wrote:
> 
> Hello,
> I am seeing -EBUSY returned a lot during torture_onoff() when running
> rcutorture on arm64. This causes hotplug failure 30% of the time. I am
> also seeing this in 6.1-rc kernels. I believe see this only for CPU0.
> 
> This causes warnings in torture tests:
> [  217.582290] rcu-torture:torture_onoff task: offline 0 failed: errno -16
> [  221.866362] rcu-torture:torture_onoff task: offline 0 failed: errno -16
> 
> Full kernel log here:
> http://box.joelfernandes.org:9080/job/rcutorture_stable_arm/job/linux-5.15.y/7/artifact/tools/testing/selftests/rcutorture/res/2023.01.15-14.51.11/TREE04/console.log
> 
> Any ideas on why this is happening and only for CPU 0 (presumably the
> boot CPU)? I'd personally need these warnings to go away for my tests
> as this causes rcutorture's tests to not cleanly pass for me. It
> appears remove_cpu() -> device_offline() is what returns the error.

I've taken your kernel for a ride as a KVM guest (probably similar to
what you are doing), and saw the same thing (CPU0 not offlining):

[   64.555845] Detected VIPT I-cache on CPU4
[   64.556146] GICv3: CPU4: found redistributor 4 region 0:0x000000003ff70000
[   64.556689] CPU4: Booted secondary processor 0x0000000004 [0x612f0290]
[   69.823670] rcu-torture:torture_onoff task: offline 0 failed: errno -16
[   73.991960] psci: CPU7 killed (polled 0 ms)
[   74.239626] rcu-torture: rcu_torture_read_exit: Start of episode
[   74.243863] rcu-torture: rcu_torture_read_exit: End of episode

I then tried v6.2-rc4 with defconfig + RCU_TORTURE and your command
line, and CPU0 does seem to hotplug off correctly:

[   47.217109] psci: CPU3 killed (polled 0 ms)
[   52.241009] Detected VIPT I-cache on CPU3
[   52.241227] cacheinfo: Unable to detect cache hierarchy for CPU 3
[   52.241481] GICv3: CPU3: found redistributor 3 region 0:0x000000003ff50000
[   52.241849] CPU3: Booted secondary processor 0x0000000003 [0x612f0290]
[   56.337011] psci: CPU0 killed (polled 0 ms)
[...]
[  121.090339] rcu-torture: Free-Block Circulation:  922 920 919 918 917 916 914 913 912 911 0
[  125.574311] Detected VIPT I-cache on CPU0
[  125.574557] cacheinfo: Unable to detect cache hierarchy for CPU 0
[  125.574901] GICv3: CPU0: found redistributor 0 region 0:0x000000003fef0000
[  125.575322] CPU0: Booted secondary processor 0x0000000000 [0x612f0290]
[  130.176893] rcu-torture: rcu_torture_read_exit: Start of episode
[  130.317001] psci: CPU0 killed (polled 0 ms)
[...]
[  225.588999] Detected VIPT I-cache on CPU0
[  225.589224] cacheinfo: Unable to detect cache hierarchy for CPU 0
[  225.589535] GICv3: CPU0: found redistributor 0 region 0:0x000000003fef0000
[  225.589946] CPU0: Booted secondary processor 0x0000000000 [0x612f0290]

No such error is being reported.

Is there anything special in your config that would help triggering
this with the current tip of tree?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list