arm64: kernel panic on 4G RAM platform

Will Deacon will at kernel.org
Mon Jun 3 06:34:06 PDT 2024


On Mon, Jun 03, 2024 at 08:19:55AM +0200, Alexander Wilhelm wrote:
> Am Wed, May 29, 2024 at 04:47:40PM +0100 schrieb Will Deacon:
> > On Thu, May 23, 2024 at 10:16:56AM +0200, Alexander Wilhelm wrote:
> > > Hello ARM64 developers,
> > > 
> > > I have a kernel panic problem on my ARM64 architecture board but I'm not sure if
> > > it's a problem in kernel or otherwise. Maybe one could help me.
> > > 
> > > My problem is the following: I'm using the NXP TQ board with ARM64 architecture
> > > to run OpenWRT operating system with linux kernel v5.15. The current
> > > revision of the board (TQMLS1046A-CB.0203) has now a 4GiB RAM instead of 2GiB.
> > > Therefore I adapted the U-Boot to use the entire memory. But now it leads to
> > > kernel crash. Interesting is that if I only use 2GiB the problem doesn't occur.
> > > The memory is splitted up in two different banks.
> > > 
> > > While analyzing my problem I tried to narrow down the source of my problem. But
> > > with each new "print message" that should me help to trace the problem I get
> > > another one. It seems like the error happens unpredictable like due to race
> > > condition or memory access. Then I tried different RAM sizes something
> > > in-between like 3GiB. I could boot successfully but then I got errors from
> > > "swiotlb" if my wireless driver tried to allocate memory from my CMA pool. I
> > > understand that the error description is vary vague but I give my best to
> > > explain my problem. Please refer to my log of the current kernel panic state:
> > > 
> > > Starting kernel ...
> > > [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd082]
> > > [    0.000000] Linux version 5.15.158 (########@##############) (aarch64-openwrt-linux-gnu-gcc (OpenWrt GCC 12.3.0 r23630-842932a63d) 12.3.0, GNU ld (GNU Binutils) 2.37) #0 SMP Wed May 22 13:15:09 2024
> > > [    0.000000] Machine model: #########
> > > [    0.000000] earlycon: uart8250 at MMIO 0x00000000021c0500 (options '')
> > > [    0.000000] printk: bootconsole [uart8250] enabled
> > > [    0.000000] Reserved memory: created DMA memory pool at 0x00000008ff800000, size 8 MiB
> > > [    0.000000] OF: reserved mem: initialized node qman-fqd, compatible id shared-dma-pool
> > > [    0.000000] Reserved memory: created DMA memory pool at 0x00000008fc000000, size 32 MiB
> > > [    0.000000] OF: reserved mem: initialized node qman-pfdr, compatible id shared-dma-pool
> > > [    0.000000] Reserved memory: created DMA memory pool at 0x00000008fe000000, size 16 MiB
> > > [    0.000000] OF: reserved mem: initialized node bman-fbpr, compatible id shared-dma-pool
> > > [    0.000000] Zone ranges:
> > > [    0.000000]   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
> > > [    0.000000]   DMA32    empty
> > > [    0.000000]   Normal   [mem 0x0000000100000000-0x00000008ffffffff]
> > > [    0.000000] Movable zone start for each node
> > > [    0.000000] Early memory node ranges
> > > [    0.000000]   node   0: [mem 0x0000000080000000-0x00000000fbdfffff]
> > > [    0.000000]   node   0: [mem 0x0000000880000000-0x00000008fbffffff]
> > > [    0.000000]   node   0: [mem 0x00000008fc000000-0x00000008feffffff]
> > > [    0.000000]   node   0: [mem 0x00000008ff000000-0x00000008ff7fffff]
> > > [    0.000000]   node   0: [mem 0x00000008ff800000-0x00000008ffffffff]
> > > [    0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000008ffffffff]
> > > [    0.000000] On node 0, zone Normal: 16896 pages in unavailable ranges
> > > [    0.000000] cma: Reserved 192 MiB at 0x00000000ee800000
> > > [    0.000000] Failed to find device node for boot cpu
> > > [    0.000000] /cpus/cpu at 0: missing reg property
> > > [    0.000000] /cpus/cpu at 1: missing reg property
> > > [    0.000000] /cpus/cpu at 2: missing reg property
> > > [    0.000000] /cpus/cpu at 3: missing reg property
> > > [    0.000000] Number of cores (5) exceeds configured maximum of 2 - clipping
> > > [    0.000000] missing boot CPU MPIDR, not enabling secondaries
> > 
> > I'd start by fixing this bit ^^^ If the secondary CPUs are spinning
> > somewhere in memory, maybe that gets allocated by Linux and you end up
> > with them executing random instructions?
> > 
> 
> It sounds very much like it. I already suspected that the CPUs were getting in
> each other's way. Unfortunately I could not reduce the CPUs to 1 in my kernel
> configuration. For some reason, 2 is the minimum number. If you could give me
> some tips on how to narrow down the problem, I would appreciate it.

I'm not really familiar with the FSL SoCs, so adding a bunch of folks
who are in case they have any ideas of things you could try next.

Will



More information about the linux-arm-kernel mailing list