[BUG] rockpro64 board hangs in console_init() after commit 10e14073107d

Tue Jun 14 05:23:32 PDT 2022

On Mon 2022-06-13 17:54:35, Alexandru Elisei wrote:
> Hi,
> 
> (Apologies for the long To: list, I've added everyone that
> scripts/get_maintainer.pl listed for fs/{fs-writeback,inode}.c, for the
> rockchip boards, for printk.c and for the tty layer)
> 
> When booting a kernel built from tag v5.19-rc2, my rockpro64-v2 hangs at
> boot with this log:
> 
> [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> [    0.000000] Linux version 5.19.0-rc2 (alex at redacted) (aarch64-linux-gnu-gcc (GCC) 12.1.0, GNU ld (GNU Binutils) 2.38) #90 SMP PREEMPT Mon Jun 13 17:13:12 BST 2022
> [    0.000000] Machine model: Pine64 RockPro64 v2.0
> [    0.000000] efi: UEFI not found.
> [    0.000000] earlycon: uart0 at MMIO32 0x00000000ff1a0000 (options '1500000n8')
> [    0.000000] printk: bootconsole [uart0] enabled
> [    0.000000] NUMA: No NUMA configuration found
> [    0.000000] NUMA: Faking a node at [mem 0x0000000000200000-0x00000000f7ffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0xf77dc140-0xf77ddfff]
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000200000-0x00000000f7ffffff]
> [    0.000000]   DMA32    empty
> [    0.000000]   Normal   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000200000-0x00000000f7ffffff]
> [    0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000f7ffffff]
> [    0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
> [    0.000000] cma: Reserved 32 MiB at 0x00000000f0000000
> [    0.000000] psci: probing for conduit method from DT.
> [    0.000000] psci: PSCIv1.1 detected in firmware.
> [    0.000000] psci: Using standard PSCI v0.2 function IDs
> [    0.000000] psci: MIGRATE_INFO_TYPE not supported.
> [    0.000000] psci: SMC Calling Convention v1.2
> [    0.000000] percpu: Embedded 20 pages/cpu s41768 r8192 d31960 u81920
> [    0.000000] Detected VIPT I-cache on CPU0
> [    0.000000] CPU features: detected: GIC system register CPU interface
> [    0.000000] CPU features: detected: ARM erratum 845719
> [    0.000000] Fallback order for Node 0: 0
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 999432
> [    0.000000] Policy zone: DMA
> [    0.000000] Kernel command line: root=PARTUUID=7f4aab92-69d8-47f3-be10-624da40a71f9 rw earlycon rootwait
> [    0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
> [    0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [    0.000000] Memory: 3915796K/4061184K available (15552K kernel code, 2758K rwdata, 8668K rodata, 6336K init, 564K bss, 112620K reserved, 32768K cma-reserved)
> [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1
> [    0.000000] rcu: Preemptible hierarchical RCU implementation.
> [    0.000000] rcu: 	RCU event tracing is enabled.
> [    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=6.
> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> [    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=6
> [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
> [    0.000000] GICv3: GIC: Using split EOI/Deactivate mode
> [    0.000000] GICv3: 256 SPIs implemented
> [    0.000000] GICv3: 0 Extended SPIs implemented
> [    0.000000] Root IRQ handler: gic_handle_irq
> [    0.000000] GICv3: GICv3 features: 16 PPIs
> [    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x00000000fef00000
> [    0.000000] ITS [mem 0xfee20000-0xfee3ffff]
> [    0.000000] ITS at 0x00000000fee20000: allocated 65536 Devices @480000 (flat, esz 8, psz 64K, shr 0)
> [    0.000000] ITS: using cache flushing for cmd queue
> [    0.000000] GICv3: using LPI property table @0x0000000000440000
> [    0.000000] GIC: using cache flushing for LPI property table
> [    0.000000] GICv3: CPU0: using allocated LPI pending table @0x0000000000450000
> [    0.000000] GICv3: GIC: PPI partition interrupt-partition-0[0] { /cpus/cpu at 0[0] /cpus/cpu at 1[1] /cpus/cpu at 2[2] /cpus/cpu at 3[3] }
> [    0.000000] GICv3: GIC: PPI partition interrupt-partition-1[1] { /cpus/cpu at 100[4] /cpus/cpu at 101[5] }
> [    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [    0.000000] arch_timer: cp15 timer(s) running at 24.00MHz (phys).
> [    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x588fe9dc0, max_idle_ns: 440795202592 ns
> [    0.000001] sched_clock: 56 bits at 24MHz, resolution 41ns, wraps every 4398046511097ns
> [    0.005107] Console: colour dummy device 80x25
> [    0.005549] printk: console [tty0] enabled
> [    0.005956] printk: bootconsole [uart0] disabled
> 
> Config can be found at [1] (expires after 6 months). I've also built the
> kernel with gcc 10.3.1 [2] (aarch64-none-linux-gnu), same issue.
> 
> I've bisected the build failure to commit 10e14073107d ("writeback: Fix
> inode->i_io_list not be protected by inode->i_lock error"); I've confirmed
> that that commit is responsible by successfully booting the board with a
> kernel built from v5.19-rc2 + the above commit reverted.

It is strange. I can't see how consoles are related to filesystem
writeback.

Anyway, the commit 10e14073107d ("writeback: Fix inode->i_io_list not
be protected by inode->i_lock error") modifies some locking and
might be source of possible deadlocks.

I am not familiar with the fs code. But I noticed the following.
The patch adds:

+               if (!was_dirty) {
+                       wb = locked_inode_to_wb_and_lock_list(inode);
+                       spin_lock(&inode->i_lock);

And locked_inode_to_wb_and_lock_list() is defined this way:

/**
 * locked_inode_to_wb_and_lock_list - determine a locked inode's wb and lock it
 * @inode: inode of interest with i_lock held
 *
 * Returns @inode's wb with its list_lock held.  @inode->i_lock must be
 * held on entry and is released on return.  The returned wb is guaranteed
 * to stay @inode's associated wb until its list_lock is released.
 */
static struct bdi_writeback *
locked_inode_to_wb_and_lock_list(struct inode *inode)
	__releases(&inode->i_lock)
	__acquires(&wb->list_lock)
{
	while (true) {
		struct bdi_writeback *wb = inode_to_wb(inode);

		/*
		 * inode_to_wb() association is protected by both
		 * @inode->i_lock and @wb->list_lock but list_lock nests
		 * outside i_lock.  Drop i_lock and verify that the
		 * association hasn't changed after acquiring list_lock.
		 */
		wb_get(wb);
		spin_unlock(&inode->i_lock);

It expects that inode->i_lock is taken before. But the problematic
commit takes it later. It might mess the lock and cause a deadlock.

Jack?

> I tried to do some investigating, it seems that the kernel is stuck at
> printk.c::console_init() -> drivers/tty/vt/vt.c::con_init() ->
> printk.c::register_console() -> unregister_console() -> console_lock().
> This has been determined by adding pr_info statements.

So, you tried something like:

int unregister_console(struct console *console)
{
	struct task_struct *thd;
	struct console *con;
	int res;

	con_printk(KERN_INFO, console, "disabled\n");
[...]

+	pr_info("Stage 1\n");
	console_lock();
+	pr_info("Stage 2\n");

[...]

	console_unlock();
+	pr_info("Stage 3\n");
[...]
}

And "Stage 1" was the last message on the console ?

Does the system boot when you avoid "earlycon" parameter?

> I've booted a kernel compiled with CONFIG_PROVE_LOCKING=y, as the offending
> commit fiddles with locks, but no splat was produced that would explain the
> hang. I've also tried to boot a v5,19-rc2 kernel on my odroid-c4, the board
> is booting just fine, so I'm guessing it only affects of subset of arm64
> boards.

You might try to switch the order of console_init() and lockdep_init()
in start_kernel() in init/main.c

Best Regards,
Petr