[bug] aarch64 host no longer boots after 767507654c22 ("arch_numa: switch over to numa_memblks")
Jan Stancek
jstancek at redhat.com
Tue Oct 29 14:03:31 PDT 2024
On Tue, Oct 29, 2024 at 5:24 PM Mike Rapoport <rppt at kernel.org> wrote:
>
> On Tue, Oct 29, 2024 at 04:43:39PM +0100, Jan Stancek wrote:
> > On Tue, Oct 29, 2024 at 4:07 PM Zi Yan <ziy at nvidia.com> wrote:
> > >
> > > +tegra mailing list and maintainers
> > >
> > > On 29 Oct 2024, at 8:47, Jan Stancek wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm seeing a regression on Nvidia IGX system, which no longer boots.
> > > >
> > > > bisect points at commit 767507654c22 ("arch_numa: switch over to numa_memblks").
> > > > It hangs very early, with 4k or 64k pages, with no kernel messages printed:
> > > >
> > > > EFI stub: Booting Linux Kernel...
> > > > EFI stub: Using DTB from configuration table
> > > > EFI stub: Exiting boot services...
> > > > <hangs here>
> > > >
> > >
> > > Is it possible to have earlycon output? It is hard to debug without any
> > > information except kernel fails to boot.
> >
> > I know it was a long shot, so far I haven't had luck getting it to work.
>
> Does it boot with numa=off and numa=fake?
No, it doesn't.
>
> In the log from successful boot it seems there is no NUMA information in
> the device tree, can you send the device tree as well please?
https://people.redhat.com/jstancek/aarch64_numa_boot/device_tree
Regards,
Jan
>
> > > Since the previous commit boots and I assume both kernels are compiled
> > > with the same gcc toolchain, this should not be caused by the binuils
> > > bug in 2.42[1]. Is your binutils version 2.42?
> >
> > Yes, both are compiled locally, with binutils 2.41
> >
> > >
> > > Thanks.
> > >
> > >
> > > [1] https://sourceware.org/bugzilla/show_bug.cgi?id=31924
> > >
> > > > Here's a log from successful boot with previous commit:
> > > > https://people.redhat.com/jstancek/aarch64_numa_boot/console-log-good.txt
> > > > and config: https://people.redhat.com/jstancek/aarch64_numa_boot/config
> > > >
> > > > # lscpu
> > > > Architecture: aarch64
> > > > CPU op-mode(s): 32-bit, 64-bit
> > > > Byte Order: Little Endian
> > > > CPU(s): 12
> > > > On-line CPU(s) list: 0-11
> > > > Vendor ID: ARM
> > > > BIOS Vendor ID: NVIDIA
> > > > Model name: Cortex-A78AE
> > > > BIOS Model name: Not Specified Not Specified CPU @ 0.0GHz
> > > > BIOS CPU family: 257
> > > > Model: 1
> > > > Thread(s) per core: 1
> > > > Core(s) per cluster: 12
> > > > Socket(s): 1
> > > > Cluster(s): 1
> > > > Stepping: r0p1
> > > > CPU(s) scaling MHz: 100%
> > > > CPU max MHz: 1971.2000
> > > > CPU min MHz: 115.2000
> > > > BogoMIPS: 62.50
> > > > Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
> > > > atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc
> > > > flagm paca pacg
> > > > Caches (sum of all):
> > > > L1d: 768 KiB (12 instances)
> > > > L1i: 768 KiB (12 instances)
> > > > L2: 3 MiB (12 instances)
> > > > L3: 6 MiB (3 instances)
> > > > NUMA:
> > > > NUMA node(s): 1
> > > > NUMA node0 CPU(s): 0-11
> > > > Vulnerabilities:
> > > > Gather data sampling: Not affected
> > > > Itlb multihit: Not affected
> > > > L1tf: Not affected
> > > > Mds: Not affected
> > > > Meltdown: Not affected
> > > > Mmio stale data: Not affected
> > > > Reg file data sampling: Not affected
> > > > Retbleed: Not affected
> > > > Spec rstack overflow: Not affected
> > > > Spec store bypass: Mitigation; Speculative Store Bypass
> > > > disabled via prctl
> > > > Spectre v1: Mitigation; __user pointer sanitization
> > > > Spectre v2: Mitigation; CSV2, BHB
> > > > Srbds: Not affected
> > > > Tsx async abort: Not affected
> > > >
> > > > Regards,
> > > > Jan
> > >
> > >
> > > Best Regards,
> > > Yan, Zi
> > >
> >
>
> --
> Sincerely yours,
> Mike.
>
More information about the linux-arm-kernel
mailing list