arm64: csdlock at early boot due to slow serial (?)
Leo Yan
leo.yan at arm.com
Thu Jul 10 06:35:57 PDT 2025
Hi Breno,
On Wed, Jul 09, 2025 at 07:23:44AM -0700, Breno Leitao wrote:
> On Tue, Jul 08, 2025 at 07:00:45AM -0700, Breno Leitao wrote:
> > On Thu, Jul 03, 2025 at 05:31:09PM +0100, Mark Rutland wrote:
> >
> > Here is more information I got about this problem. TL;DR: While the
> > machine is booting, it is throttled by the UART speed, while having IRQ
> > disabled.
>
> quick update: I've identified a solution that significantly improves the
> situation. I've found that the serial issue was heavily affecting boot
> time, which is unleashed now.
>
> After applying the following fix, the boot speed has improved
> dramatically. It's the fastest I've seen, and the CSD lockups are gone.
Thanks for trying to fix the issue.
> If no concerns raise in the next days, I will send it officially to the
> serial maintainers.
I am not an expert on the PL011 driver, however, I do have concerns
after review the change. Please see my comments below.
> Author: Breno Leitao <leitao at debian.org>
> Date: Wed Jul 9 05:57:06 2025 -0700
>
> serial: amba-pl011: Fix boot performance by switching to console_initcall()
>
> Replace arch_initcall() with console_initcall() for PL011 driver initialization
> to resolve severe boot performance issues.
pl011_init() registers as an AMBA device, so the PL011 driver depends
on the AMBA bus initialization. The AMBA bus is initialized with:
postcore_initcall(amba_init);
Therefore, the PL011 driver is initialized with arch_initcall(), which
occurs later than the postcore init.
My understanding is that console_initcall() is invoked much earlier
than other initcalls triggered by do_initcalls(). With your change, I
saw the PL011 driver fails to register on Juno-r2 board, due to AMBA bus
driver is not ready for a console init.
Driver 'uart-pl011' was unable to register with bus_type 'amba'
because the bus was not initialized.
> The current arch_initcall() registration causes the console to initialize
> before the printk subsystem is ready, forcing the driver into atomic mode
> during early boot. This results in:
>
> - 5-8 second boot delay while ~700 boot messages are processed
> - System freeze with IRQs disabled during message output
> - Each character transmitted synchronously with cpu_relax() polling
>
> This is what is driving the driver to atomic mode in the early boot:
>
> static inline void printk_get_console_flush_type(struct console_flush_type *ft)
> {
> ....
> if (printk_kthreads_running)
> ft->nbcon_offload = true;
>
> The atomic path processes each character individually through
> pl011_console_putchar(), waiting for UART transmission completion
> before proceeding. With only one CPU online during early boot,
> this creates a bottleneck where the system spends excessive time
> in interrupt-disabled state.
The atomic path is introduced recently by the commit:
2eb2608618ce ("serial: amba-pl011: Implement nbcon console")
My conclusion is that changing the initcall will not disable the atomic
path, changing to console_initcall() will cause AMBA device init
failure, and as a result, the clock operations will not be invoked.
Thus, I am curious if you have ruled out the issue is caused by the UART
clock (as I mentioned in another reply).
BTW, since the atomic path is enabled in the commit 2eb2608618ce, what
is the result after reverting the commit?
Thanks,
Leo
> Here is how the code looks like:
>
> 1) disable interrupt
> 2) for each of these 700 messages, call pl011_console_write_atomic()
> 3) for each character in the message, calls pl011_console_putchar(),
> which waits for the character to be transmitted
> 4) once all the line is transmitted, wait for the UART to be idle
> 5) re-enable interrupt
>
> Here is the code representation of the above:
>
> pl011_console_write_atomic() {
> ...
> // For each char in the message
> pl011_console_putchar() {
> while (pl011_read(uap, REG_FR) & UART01x_FR_TXFF)
> cpu_relax();
> }
> while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
> cpu_relax();
>
> Using console_initcall() ensures proper initialization order,
> allowing the printk subsystem to use threaded output instead
> of atomic mode, eliminating the performance bottleneck.
>
> Performance improvement: 16x faster kernel boot time at my GRACE SoC
> machine.
>
> - Before: 10.08s to reach init process
> - After: 0.62s to reach init process
>
> Here are more timing details, collected from Linus' upstream, where the
> only different is this patch:
>
> Linus upstream:
> [ 0.616203] printk: legacy console [netcon_ext0] enabled
> [ 0.627469] Run /init as init process
> [ 0.837477] loop: module loaded
> [ 8.354803] Adding 134199360k swap on /swapvol/swapfile.
>
> With this patch:
> [ 0.305109] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA
> [ 10.081742] Run /init as init process
> [ 13.288717] loop: module loaded
> [ 22.919934] Adding 134199168k swap on /swapvol/swapfile.
>
> Link: https://lore.kernel.org/all/aGVn%2FSnOvwWewkOW@gmail.com/ [1]
>
> Signed-off-by: Breno Leitao <leitao at debian.org>
>
> diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
> index 22939841b1de..0cf251365825 100644
> --- a/drivers/tty/serial/amba-pl011.c
> +++ b/drivers/tty/serial/amba-pl011.c
> @@ -3116,7 +3116,7 @@ static void __exit pl011_exit(void)
> * While this can be a module, if builtin it's most likely the console
> * So let's leave module_exit but move module_init to an earlier place
> */
> -arch_initcall(pl011_init);
> +console_initcall(pl011_init);
> module_exit(pl011_exit);
>
> MODULE_AUTHOR("ARM Ltd/Deep Blue Solutions Ltd");
>
>
>
>
>
More information about the linux-arm-kernel
mailing list