arm64 regression in kernel 5.12 related to the (n)VHE

Florian Fainelli f.fainelli at gmail.com
Thu Aug 12 11:29:04 PDT 2021



On 8/12/2021 2:57 PM, Marc Zyngier wrote:
> On Thu, 12 Aug 2021 13:29:56 +0100,
> Rafał Miłecki <zajec5 at gmail.com> wrote:
>>
>> On 12.08.2021 12:13, Marc Zyngier wrote:
>>> On Thu, 12 Aug 2021 09:24:14 +0100,
>>> Rafał Miłecki <zajec5 at gmail.com> wrote:
> 
> [...]
> 
>>>> I'm just an end-user with no access to CFE sources and without any
>>>> business contact as Broadcom :(
>>>
>>> I feared that would be the case. Florian's reply seems to indicate
>>> that the "upstream" firmware implementation is correct, so the OEM
>>> must have fumbled it somehow...
>>
>> Please note that Broadcom has many business units, many teams and from
>> my understanding they often don't cooperate properly.
> 
> I bet some team sampled an early version of the firmware that included
> the bug and never looked back. You can also tell the level of quality
> by the fact that it uses spin-tables to boot, that the interrupt
> controller node is incomplete...
> 
>> It's likely that BCM4908 BU screwed something up. Or maybe it's a matter
>> of CFE vs. U-Boot?
> 
> It is a matter of whatever is running at EL3 and doing the basic setup
> of the CPUs.



>>
>> Florian: does your team (set-top box and cable modem devices) use CFE or
>> U-Boot with kernels 5.12+?

Set-top-box and cable modem devices use a different boot loader (called 
BOLT) and a different EL3 firmware than the 4908 implementation, 
although we don't use virtualization we did pay attention to the 
register set-up.

Just got confirmation from the team that authored the 4908 CFE that they 
*do not* set SCR_EL3.HCE on the premise that they do not use 
virtualization....

>>
>> It's very unlikely it's a single OEM that broke CFE with custom
>> modifications. This problem affects all 3 devices I own:
>> 1. Netgear R8000P
>> 2. TP-Link Archer C2300 V1
>> 3. Asus GT-AC5300
> 
> They probably all use the same pre-cast design with some sort of
> value-add on top.

Yes indeed.

> 
> [...]
> 
>>> That's expected. Can you please check the patch below? It should
>>> result in a booting kernel which actually survives having KVM compiled
>>> in. It should even display a warning telling you that your setup is
>>> completely buggered.
>>>
>>> That's obviously not the final version, but probably a good enough
>>> approximation.
>>
>> It seems to work! Kernel has booted and I saw:
>> CPU: CPUs started in inconsistent modes
>> WARNING: CPU: 0 PID: 1 at arch/arm64/kernel/smp.c:426 smp_cpus_done+0x8c/0xc8
>> (...)
>> kvm [1]: HYP mode not available
> 
> Right. So there is some hope. Maybe. I'm not sure I want to maintain
> this crap though.
> 
> [...]
> 
>> nand: device found, Manufacturer ID: 0xc8, Chip ID: 0xda
>> nand: ESMT NAND 256MiB 3,3V 8-bit
>> nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
>> bcm63138_nand ff801800.nand: detected 256MiB total, 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
>> Bad block table found at page 131008, version 0x01
>> Bad block table found at page 130944, version 0x01
>> 3 fixed-partitions partitions found on MTD device brcmnand.0
>> Creating 3 MTD partitions on "brcmnand.0":
>> 0x000000000000-0x000000100000 : "cferom"
>> 0x000000100000-0x000005800000 : "firmware"
>> 0x000005800000-0x00000af00000 : "backup"
> 
> So here's your chance! You have the firmware image here (I guess
> "cferom" is the one). It'd be interesting to disassemble it, find out
> where SCR_EL3 is set, patch it and never look back.
> 
> Only kidding.
> 
> 	M.
> 

-- 
Florian



More information about the linux-arm-kernel mailing list