Random reboots on ODROID-N2+
Robin Murphy
robin.murphy at arm.com
Mon Jul 26 05:31:50 PDT 2021
On 2021-07-26 13:07, Stefan Agner wrote:
> FWIW, I did run two boards over the weekend with stress-ng vm test
> running to cause memory pressure, one board with 8a5a75e5e9e55 ("of/fdt:
> Make sure no-map does not remove already reserved regions") reverted.
> The one without the revert crashed after ~24h, the other did run through
> the weekend. Basically confirming what Byron reported.
>
> On 2021-07-26 09:54, Neil Armstrong wrote:
>> Hi,
>>
>> On 23/07/2021 21:48, Stefan Agner wrote:
>>> On 2021-07-23 19:47, Robin Murphy wrote:
>>>> On 2021-07-23 17:14, Robin Murphy wrote:
>>>>> On 2021-07-23 16:56, Stefan Agner wrote:
>>> <snip>
>>>>>>>
>>>>>>> Booting with "efi=debug" should (among other things) print the memory
>>>>>>> map at boot if you want to double-check that that is the source of the
>>>>>>> mismatch. Our EFI code should be perfectly capable of setting the
>>>>>>> memblock flag if the region *is* described appropriately, see
>>>>>>> reserve_regions() in drivers/firmware/efi/efi-init.c.
>>>>>>
>>>>>> Booting 5.12.10 with "efi=debug" on U-Boot 2021.04 gave this:
>>>>>> [ 0.000000] Machine model: Hardkernel ODROID-N2Plus
>>>>>> [ 0.000000] efi: Getting UEFI parameters from /chosen in DT:
>>>>>> [ 0.000000] efi: UEFI not found.
>>>>>> [ 0.000000] OF: fdt: Reserved memory: failed to reserve memory for
>>>>>> node 'secmon at 5000000': base 0x0000000005000000, size 3 MiB
>>>>>>
>>>>>> So it seems UEFI is not in the play here?
>>>>>
>>>>> Ah, OK, in that case I guess the question remains why does early_init_dt_reserve_memory_arch() think the region is already reserved? My instinctive assumption was an EFI memory map being present; seeing that U-Boot does indeed reflect DT reservations there *and* has had a likely-looking bug recently was then just overwhelmingly suggestive :)
>>>>
>>>> Actually, poking at U-Boot a bit more I find
>>>> meson_board_add_reserved_memory() - can you check /sys/firmware/fdt
>>>> and see if the region ends up being passed as a /memreserve/ as well
>>>> as a proper reserved-memory node?
>>>>
>>>> IIRC the semantics of /memreserve/ aren't really well-defined enough
>>>> to be suitable for the kind of things which require no-map, and my new
>>>> guess is that that's what ends up conflicting here.
>>>
>>> Seems to be present in booth:
>>
>> Indeed, in order so support any combination:
>> - upstream u-boot
>> - vendor u-boot
>> - upstream linux
>> - other OS
>>
>> The secmon is in the upstream Linux DT, and upstream u-boot reads the
>> secure memory regions
>> from the first stage bootloaders and adds them into the DT memreserve.
>>
>> It worked fine since Linux 4.10-ish, until 5.10.
>
> Just verified what is probably obvious at this point: By removing
> meson_board_add_reserved_memory() the /memreserve/ region isn't present
> and "failed to reserve memory" message disappears indeed.
>
> Why is reserving memory not enough? From what I've read no-map also make
> sure there is no VM mapping, but if the region is reserved, shouldn't
> that be enough for Linux to not access the region? I've read that no-map
> also preventsaccess due to speculation, is this what is happening here?
Almost certainly - being reserved either way means that Linux won't try
to access those pages directly, but if they are still present in the
linear map as Normal memory which allows speculation, legitimate access
to adjacent pages may well cause the CPU to end up prefetching into them.
> What is the proper solution here? Could maybe
> meson_board_add_reserved_memory() check if reserved-memory is present,
> and if so avoid adding /memreserve/?
Perhaps, although it doesn't help people who can't or don't want to
update their firmware. As I say, I'm not sure what the expectations are
supposed to be for /memreserve/, particularly if it duplicates
reserved-memory. Furthermore, looking at 8a5a75e5e9e55 I'm also not
really convinced that making the kernel boot for the sake of debugging a
fundamentally broken bootloader is a common and realistic enough issue
to justify breaking the existing not-necessarily-invalid bootloader
behaviour of other widely-deployed systems :/
Robin.
More information about the linux-arm-kernel
mailing list