arm64: csdlock at early boot due to slow serial (?)
Breno Leitao
leitao at debian.org
Tue Jul 8 07:00:45 PDT 2025
On Thu, Jul 03, 2025 at 05:31:09PM +0100, Mark Rutland wrote:
> > How do I find the SoC exactly?
>
> >From what you've told me above, the SoC is Nvidia Grace; what they call
> the CPU is the whole SoC.
>
> > > Likewise that might imply more folk to add to Cc.
>
> I've added Ankit and Besar, since they've both worked on some system
> level bits on Grace, and might have an idea.
>
> Ankit, Besar, are you aware of any UART issues on Grace (as described in
> Breno's messages below), or do you know of anyone who might have an
> idea?
Here is more information I got about this problem. TL;DR: While the
machine is booting, it is throttled by the UART speed, while having IRQ
disabled.
Here is my line of investigation:
1) when the kernel start to flush the messages, there are ~700 messages
in the buffer that will be flushed using
pl011_console_write_atomic()
[ 0.653673] printk: console [ttyAMA0] enabled
[ 9.239225] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff])
a) pl011_console_write_atomic() is called ~700 times. Each message
have around 50 chars.
b) For each char, it needs to wait for it to be sent before proceeding.
3) pl011_console_write_atomic() have two big while()/cpu_relax() blocks.
One per character and one per message.
a) the first one is in pl011_console_putchar(), where it checks if the
FIFO is full before writing a single character. This is done one per
character.
while (pl011_read(uap, REG_FR) & UART01x_FR_TXFF)
cpu_relax();
b) After the line is written, then it hits the second one, in
pl011_console_write_atomic(), where it waits until the controller is
not busy anymore:
while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
cpu_relax();
4) The controller has FIFO enabled, but, it doesn't help much, because
the CPU is waiting for the 700 * 50 chars to be transmitted.
5) At this time we just have one single CPU online (?)
6) This is done with IRQ disabled, which causes a CSD lock to trigger,
given that the CPU has IRQ disabled
a) Following Paul's suggestion, I added
a `lockdep_assert_irqs_enabled()` at the entrance of
pl011_console_write_atomic(), and that is what I see.
hardirqs last enabled at (256267): [<ffff800080364980>] vprintk_store+0x6f8/0x820
hardirqs last disabled at (256268): [<ffff80008036e1c4>] __nbcon_atomic_flush_pending+0x2c4/0x3e8
softirqs last enabled at (255640): [<ffff8000801bb020>] handle_softirqs+0x910/0xe78
softirqs last disabled at (255635): [<ffff800080010938>] __do_softirq+0x18/0x20
b) This is the full stack in how we are calling cpu_relax() with irq
disabled million times:
WARNING: CPU: 0 PID: 1 at drivers/tty/serial/amba-pl011.c:2545 pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545)
Modules linked in:
pstate: 234003c9 (nzCv DAIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545)
lr : pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545)
sp : ffff80008980dca0
pmr: 000000f0
x29: ffff80008980dca0 x28: ffff0000aebcc080 x27: dfff800000000000
x26: 1fffe00015d7985a x25: 0000000000000000 x24: 0000000000000000
x23: ffff0000a1800000 x22: ffff800087e6d7f8 x21: 1fffe00015d7985b
x20: ffff80008980dee0 x19: 0000000000000003 x18: 00000000ffffffff
x17: 30313478305b2030 x16: 3030303230303030 x15: 3078302055504320
x14: 6c61636973796870 x13: 0a2a2a2065676173 x12: ffff700011301b8b
x11: 1ffff00011301b8a x10: ffff700011301b8a x9 : ffff800081eafdb4
x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000000
x5 : 1ffff00011301be2 x4 : 0000000000000018 x3 : 0000000000000000
x2 : 1ffff00010fcdb57 x1 : 0000000000000000 x0 : 0000000000000001
Call trace:
pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545) (P)
nbcon_emit_next_record (kernel/printk/nbcon.c:1026)
__nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1498)
__nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1541 kernel/printk/nbcon.c:1593)
nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1610)
vprintk_emit (kernel/printk/printk.c:2429)
vprintk_default (kernel/printk/printk.c:2466)
vprintk (kernel/printk/printk_safe.c:83)
_printk (kernel/printk/printk.c:2470)
__warn (kernel/panic.c:768)
report_bug (lib/bug.c:197 lib/bug.c:215)
bug_handler (arch/arm64/kernel/traps.c:1002)
call_break_hook (arch/arm64/kernel/debug-monitors.c:319)
brk_handler (arch/arm64/kernel/debug-monitors.c:325)
do_debug_exception (arch/arm64/mm/fault.c:1002)
el1_dbg (arch/arm64/kernel/entry-common.c:514)
el1h_64_sync_handler (arch/arm64/kernel/entry-common.c:567)
el1h_64_sync (arch/arm64/kernel/entry.S:595)
pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2524) (P)
nbcon_emit_next_record (kernel/printk/nbcon.c:1026)
__nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1498)
__nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1541 kernel/printk/nbcon.c:1593)
nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1610)
vprintk_emit (kernel/printk/printk.c:2429)
vprintk_default (kernel/printk/printk.c:2466)
vprintk (kernel/printk/printk_safe.c:83)
_printk (kernel/printk/printk.c:2470)
register_console (kernel/printk/printk.c:4126 (discriminator 9))
serial_core_register_port (drivers/tty/serial/serial_core.c:2637 drivers/tty/serial/serial_core.c:3157 drivers/tty/serial/serial_core.c:3388)
serial_ctrl_register_port (drivers/tty/serial/serial_ctrl.c:42)
uart_add_one_port (drivers/tty/serial/serial_port.c:144)
pl011_register_port (drivers/tty/serial/amba-pl011.c:2867)
sbsa_uart_probe (drivers/tty/serial/amba-pl011.c:3041)
platform_probe (drivers/base/platform.c:1405)
really_probe (drivers/base/dd.c:579 drivers/base/dd.c:657)
__driver_probe_device (drivers/base/dd.c:799)
driver_probe_device (drivers/base/dd.c:829)
__device_attach_driver (drivers/base/dd.c:958)
bus_for_each_drv (drivers/base/bus.c:462)
__device_attach (drivers/base/dd.c:1031)
device_initial_probe (drivers/base/dd.c:1079)
bus_probe_device (drivers/base/bus.c:537)
device_add (drivers/base/core.c:3699)
platform_device_add (drivers/base/platform.c:716)
platform_device_register_full (drivers/base/platform.c:845)
acpi_create_platform_device (./include/linux/err.h:70 drivers/acpi/acpi_platform.c:178)
acpi_bus_attach (./include/linux/acpi.h:738 drivers/acpi/scan.c:2206 drivers/acpi/scan.c:2198 drivers/acpi/scan.c:2316)
acpi_dev_for_one_check (drivers/acpi/bus.c:1146)
device_for_each_child (drivers/base/core.c:4021)
acpi_dev_for_each_child (drivers/acpi/bus.c:1151)
acpi_bus_attach (drivers/acpi/scan.c:2323)
acpi_dev_for_one_check (drivers/acpi/bus.c:1146)
device_for_each_child (drivers/base/core.c:4021)
acpi_dev_for_each_child (drivers/acpi/bus.c:1151)
acpi_bus_attach (drivers/acpi/scan.c:2323)
acpi_bus_scan (drivers/acpi/scan.c:2533 drivers/acpi/scan.c:2606)
acpi_scan_init (drivers/acpi/scan.c:2739)
acpi_init (drivers/acpi/bus.c:1470)
do_one_initcall (init/main.c:1274)
kernel_init_freeable (init/main.c:1335 init/main.c:1352 init/main.c:1371 init/main.c:1584)
kernel_init (init/main.c:1476)
ret_from_fork (arch/arm64/kernel/entry.S:863)
irq event stamp: 256268
hardirqs last enabled at (256267): vprintk_store (kernel/printk/printk.c:2356 (discriminator 1))
hardirqs last disabled at (256268): __nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1539 kernel/printk/nbcon.c:1593)
softirqs last enabled at (255640): handle_softirqs (./arch/arm64/include/asm/current.h:19 ./arch/arm64/include/asm/preempt.h:13 kernel/softirq.c:426 kernel/softirq.c:607)
softirqs last disabled at (255635): __do_softirq (kernel/softirq.c:614)
---[ end trace 0000000000000000 ]---
More information about the linux-arm-kernel
mailing list