arm64 regression in kernel 5.12 related to the (n)VHE

Marc Zyngier maz at kernel.org
Wed Aug 11 23:51:23 PDT 2021


On Wed, 11 Aug 2021 17:55:07 +0100,
Rafał Miłecki <zajec5 at gmail.com> wrote:
> 
> On 11.08.2021 14:50, Marc Zyngier wrote:
> > On Wed, 11 Aug 2021 13:15:31 +0100,
> > Rafał Miłecki <zajec5 at gmail.com> wrote:
> >> 
> >> Hi,
> >> 
> >> I just tried upgrading from the old good LTS kernel 5.10 and I
> >> discovered that my bcm4908 boards don't boot anymore with the 5.14-rc5.
> >> 
> >> 
> >> The problem is kernel doesn't seem to start booting at all. I see CFE
> >> bootloader messages:
> >> 
> >> Starting program at 0x0000000000080000
> >> /memory = 0x40000000
> >> 
> >> and then nothing. Normally the first kernel line should follow like a:
> >> Linux version 5.11.0-rc4 (rmilecki at localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #30 SMP Wed Aug 11 14:01:00 CEST 2021
> >> 
> >> 
> >> I have zero knowledge of low level arm64 or assembler stuff. I also
> >> don't own any bcm4908 development board or bcm4908 datasheets.
> >> 
> >> All I could do to help debugging this regression was bisecting. The
> >> first bad commit (I verified it after bisecting process) is:
> >> 
> >> commit 0c93df9622d4d921bcd0dc83f71fed9e98f5119f
> >> Author: Marc Zyngier <maz at kernel.org>
> >> Date:   Mon Feb 8 09:57:14 2021 +0000
> >> 
> >>      arm64: Initialise as nVHE before switching to VHE
> >> 
> >>      As we are aiming to be able to control whether we enable VHE or
> >>      not, let's always drop down to EL1 first, and only then upgrade
> >>      to VHE if at all possible.
> >> 
> >>      This means that if the kernel is booted at EL2, we always start
> >>      with a nVHE init, drop to EL1 to initialise the the kernel, and
> >>      only then upgrade the kernel EL to EL2 if possible (the process
> >>      is obviously shortened for secondary CPUs).
> >> 
> >>      The resume path is handled similarly to a secondary CPU boot.
> >> 
> >>      Signed-off-by: Marc Zyngier <maz at kernel.org>
> >>      Acked-by: David Brazdil <dbrazdil at google.com>
> >>      Acked-by: Catalin Marinas <catalin.marinas at arm.com>
> >>      Link: https://lore.kernel.org/r/20210208095732.3267263-6-maz@kernel.org
> >>      [will: Avoid calling switch_to_vhe twice on kaslr path]
> >>      Signed-off-by: Will Deacon <will at kernel.org>
> >> 
> >> 
> >> Could you look at this issue, please? I'm happy to test any patches or
> >> provide any extra info I can obtain using kernel 5.11.
> >> 
> >> 
> >> My defconfig for bcm4908 is:
> > 
> > [...]
> > 
> > I don't think the dconfig is that relevant (nothing you quote here
> > would have an impact that early in the boot process).
> > 
> > On the other hand, a description of the platform (what CPUs does it
> > have) and how it boots (VHE, non-VHE, booted at EL2 or not) would be
> > extremely useful. At minimum, a boot log of a working kernel could
> > help.
> 
> Thank you for your patience & reply.
> 
> BCM4908 is Broadcom's 64-bit platform with Broadcom's own Brahma-B53
> CPU(s). I don't know how it boots. Is that something I can find out
> from a running system?
> 
> For DTS SoC description you can check:
> arch/arm64/boot/dts/broadcom/bcm4908/bcm4908.dtsi
> 
> See below for bootlog and /proc/cpuinfo. Please note I seem to have
> console misconfigured and early part of log appears twice (nothing
> really harmful).
> 
> Starting program at 0x0000000000080000
> /memory = 0x40000000
> WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
> WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
> WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
> Booting Linux on physical CPU 0x0000000000 [0x420f1000]
> Linux version 5.11.22-g40462c7f0649 (rmilecki at localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #9 SMP Wed Aug 11 18:39:58 CEST 2021
> Machine model: Asus GT-AC5300
> earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
> printk: bootconsole [bcm63xx_uart0] enabled
> efi: UEFI not found.
> [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
> Zone ranges:
>   DMA      [mem 0x0000000000000000-0x000000003fffffff]
>   DMA32    empty
>   Normal   empty
> Movable zone start for each node
> Early memory node ranges
>   node   0: [mem 0x0000000000000000-0x000000003fffffff]
> Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
> percpu: Embedded 17 pages/cpu s37856 r0 d31776 u69632
> Detected VIPT I-cache on CPU0
> CPU features: detected: ARM erratum 843419
> Built 1 zonelists, mobility grouping on.  Total pages: 258048
> Kernel command line: earlycon=bcm63xx_uart,0xff800640
> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
> mem auto-init: stack:off, heap alloc:off, heap free:off
> Memory: 1020660K/1048576K available (3584K kernel code, 650K rwdata, 684K rodata, 2368K init, 229K bss, 27916K reserved, 0K cma-reserved)
> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> rcu: Hierarchical RCU implementation.
> rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
> GIC: Using split EOI/Deactivate mode
> random: get_random_bytes called from start_kernel+0x33c/0x524 with crng_init=0
> arch_timer: cp15 timer(s) running at 50.00MHz (phys).
> clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
> sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
> Console: colour dummy device 80x25
> printk: console [tty0] enabled
> printk: bootconsole [bcm63xx_uart0] disabled
> Booting Linux on physical CPU 0x0000000000 [0x420f1000]
> Linux version 5.11.22-g40462c7f0649 (rmilecki at localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #9 SMP Wed Aug 11 18:39:58 CEST 2021
> Machine model: Asus GT-AC5300
> earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
> printk: bootconsole [bcm63xx_uart0] enabled
> efi: UEFI not found.
> [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
> Zone ranges:
>   DMA      [mem 0x0000000000000000-0x000000003fffffff]
>   DMA32    empty
>   Normal   empty
> Movable zone start for each node
> Early memory node ranges
>   node   0: [mem 0x0000000000000000-0x000000003fffffff]
> Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
> percpu: Embedded 17 pages/cpu s37856 r0 d31776 u69632
> Detected VIPT I-cache on CPU0
> CPU features: detected: ARM erratum 843419
> Built 1 zonelists, mobility grouping on.  Total pages: 258048
> Kernel command line: earlycon=bcm63xx_uart,0xff800640
> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
> mem auto-init: stack:off, heap alloc:off, heap free:off
> Memory: 1020660K/1048576K available (3584K kernel code, 650K rwdata, 684K rodata, 2368K init, 229K bss, 27916K reserved, 0K cma-reserved)
> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> rcu: Hierarchical RCU implementation.
> rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
> GIC: Using split EOI/Deactivate mode
> random: get_random_bytes called from start_kernel+0x33c/0x524 with crng_init=0
> arch_timer: cp15 timer(s) running at 50.00MHz (phys).
> clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
> sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
> Console: colour dummy device 80x25
> printk: console [tty0] enabled
> printk: bootconsole [bcm63xx_uart0] disabled
> Calibrating delay loop (skipped), value calculated using timer frequency.. 100.00 BogoMIPS (lpj=200000)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
> Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
> rcu: Hierarchical SRCU implementation.
> EFI services will not be available.
> smp: Bringing up secondary CPUs ...
> Detected VIPT I-cache on CPU1
> CPU1: Booted secondary processor 0x0000000001 [0x420f1000]
> Detected VIPT I-cache on CPU2
> CPU2: Booted secondary processor 0x0000000002 [0x420f1000]
> Detected VIPT I-cache on CPU3
> CPU3: Booted secondary processor 0x0000000003 [0x420f1000]
> smp: Brought up 1 node, 4 CPUs
> SMP: Total of 4 processors activated.
> CPU features: detected: 32-bit EL0 Support
> CPU features: detected: CRC32 instructions
> CPU: All CPU(s) started at EL2

Interestingly, all your CPUs are booting at EL2. Which is great.  Can
you try and enable KVM on your existing 5.10 kernel? Just selecting
CONFIG_KVM should be enough. Does it boot correctly with KVM enabled?

My suspicion is that the firmware doesn't set SCR_EL3.HCE, and that
the HVC instruction UNDEFs at EL1. That would be bad news.

Please let me know.

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list