CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
Breno Leitao
leitao at debian.org
Mon Dec 1 09:04:07 PST 2025
Hello Petr,
On Fri, Nov 28, 2025 at 05:08:17PM +0100, Petr Mladek wrote:
> On Tue 2025-11-25 08:02:16, Breno Leitao wrote:
>
> I do _not_ think that the CPU was waiting in pl011_console_write_atomic() in the
> the following cycle the entire 11 secs:
>
> while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
> cpu_relax();
>
> A more likely scenario was that pl011_console_write_atomic() was
> called several times during this period because there were more
> pending messages.
Probably. Most of the messages are coming from CPU being powered off:
[ 44.119433] psci: CPU1 killed (polled 0 ms)
[ 44.146057] psci: CPU2 killed (polled 0 ms)
[ 44.182058] psci: CPU3 killed (polled 0 ms)
[ 44.218031] psci: CPU4 killed (polled 0 ms)
[ 44.252962] psci: CPU5 killed (polled 0 ms)
[ 44.276939] psci: CPU6 killed (polled 0 ms)
[ 44.296152] psci: CPU7 killed (polled 1 ms)
....
And this only happens on "large" machines, thus, the host is flushing
a lot of messages during kexec turn down time.
> > printk_kthreads_shutdown (kernel/printk/printk.c:?)
>
> But the function seems be called with IRQs enabled. So that it might
> help to restore IRQs after each flushed message.
Agree. This would make the irq-disabled sections much smaller, with
a higher changes of IPIs and NMIs (on arm64 hosts without FEAT_NMI).
> But we could extend the existing commit d5d399efff6577 ("printk/nbcon:
> Release nbcon consoles ownership in atomic flush after each emitted
> record") and restore IRQs after each emitted record.
>
> I wonder if the following patch would help in this scenario.
> It is made on top of "for-next" branch in printk/linux.git.
> But the most important pre-requisite is the above mentioned commit
> in the branch "rework/atomic-flush-hardlockup".
>
> Note that the patch is only compile tested.
I've tested the patch and I don't see the CSD lockups anymore.
Thanks for the quick fix.
> Closes: https://lore.kernel.org/r/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu
> Signed-off-by: Petr Mladek <pmladek at suse.com>
Tested-by: Breno Leitao <leitao at debian.org>
Thanks for all people involved in here. With this last patch (that makes
the irq-disbled section smaller), and kfence not IPIing during kexec
time, I consider this issue closed.
--breno
More information about the linux-arm-kernel
mailing list