[bug] aarch64 host no longer boots after 767507654c22 ("arch_numa: switch over to numa_memblks")
Mike Rapoport
rppt at kernel.org
Tue Oct 29 09:20:22 PDT 2024
On Tue, Oct 29, 2024 at 04:43:39PM +0100, Jan Stancek wrote:
> On Tue, Oct 29, 2024 at 4:07 PM Zi Yan <ziy at nvidia.com> wrote:
> >
> > +tegra mailing list and maintainers
> >
> > On 29 Oct 2024, at 8:47, Jan Stancek wrote:
> >
> > > Hi,
> > >
> > > I'm seeing a regression on Nvidia IGX system, which no longer boots.
> > >
> > > bisect points at commit 767507654c22 ("arch_numa: switch over to numa_memblks").
> > > It hangs very early, with 4k or 64k pages, with no kernel messages printed:
> > >
> > > EFI stub: Booting Linux Kernel...
> > > EFI stub: Using DTB from configuration table
> > > EFI stub: Exiting boot services...
> > > <hangs here>
> > >
> >
> > Is it possible to have earlycon output? It is hard to debug without any
> > information except kernel fails to boot.
>
> I know it was a long shot, so far I haven't had luck getting it to work.
Does it boot with numa=off and numa=fake?
In the log from successful boot it seems there is no NUMA information in
the device tree, can you send the device tree as well please?
> > Since the previous commit boots and I assume both kernels are compiled
> > with the same gcc toolchain, this should not be caused by the binuils
> > bug in 2.42[1]. Is your binutils version 2.42?
>
> Yes, both are compiled locally, with binutils 2.41
>
> >
> > Thanks.
> >
> >
> > [1] https://sourceware.org/bugzilla/show_bug.cgi?id=31924
> >
> > > Here's a log from successful boot with previous commit:
> > > https://people.redhat.com/jstancek/aarch64_numa_boot/console-log-good.txt
> > > and config: https://people.redhat.com/jstancek/aarch64_numa_boot/config
> > >
> > > # lscpu
> > > Architecture: aarch64
> > > CPU op-mode(s): 32-bit, 64-bit
> > > Byte Order: Little Endian
> > > CPU(s): 12
> > > On-line CPU(s) list: 0-11
> > > Vendor ID: ARM
> > > BIOS Vendor ID: NVIDIA
> > > Model name: Cortex-A78AE
> > > BIOS Model name: Not Specified Not Specified CPU @ 0.0GHz
> > > BIOS CPU family: 257
> > > Model: 1
> > > Thread(s) per core: 1
> > > Core(s) per cluster: 12
> > > Socket(s): 1
> > > Cluster(s): 1
> > > Stepping: r0p1
> > > CPU(s) scaling MHz: 100%
> > > CPU max MHz: 1971.2000
> > > CPU min MHz: 115.2000
> > > BogoMIPS: 62.50
> > > Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
> > > atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc
> > > flagm paca pacg
> > > Caches (sum of all):
> > > L1d: 768 KiB (12 instances)
> > > L1i: 768 KiB (12 instances)
> > > L2: 3 MiB (12 instances)
> > > L3: 6 MiB (3 instances)
> > > NUMA:
> > > NUMA node(s): 1
> > > NUMA node0 CPU(s): 0-11
> > > Vulnerabilities:
> > > Gather data sampling: Not affected
> > > Itlb multihit: Not affected
> > > L1tf: Not affected
> > > Mds: Not affected
> > > Meltdown: Not affected
> > > Mmio stale data: Not affected
> > > Reg file data sampling: Not affected
> > > Retbleed: Not affected
> > > Spec rstack overflow: Not affected
> > > Spec store bypass: Mitigation; Speculative Store Bypass
> > > disabled via prctl
> > > Spectre v1: Mitigation; __user pointer sanitization
> > > Spectre v2: Mitigation; CSV2, BHB
> > > Srbds: Not affected
> > > Tsx async abort: Not affected
> > >
> > > Regards,
> > > Jan
> >
> >
> > Best Regards,
> > Yan, Zi
> >
>
--
Sincerely yours,
Mike.
More information about the linux-arm-kernel
mailing list