ARM: aarch64: lowlevel: potential bug in arm_cpu_lowlevel_init

Mon Aug 14 04:41:29 PDT 2023

Hi Andrew,

I am asking about a patch you've introduced about 4 years ago:
Commit: cd6e1857a6a824d562bd27379d191602c074f6b7

ENTRY(arm_cpu_lowlevel_init)
	switch_el x1, 3f, 2f, 1f

3:
	mov	x0, #1			/* Non-Secure EL0/1 */
	orr	x0, x0, #(1 << 10)	/* 64-bit EL2 */
	msr	scr_el3, x0
	msr	cptr_el3, xzr

	mrs	x0, sctlr_el3
	ldr	x1, =SCTLR_ELx_FLAGS
	bic	x0, x0, x1
	msr	sctlr_el3, x0
	isb

	b	done

This code has introduced a bug in our barebox porting.
It could be our mistake but then again we couldn't find any prerequisites conditions that bootloaders need to meet before passing control to barebox pbl.
There are 2 bugs that can happen here:
1. The bootloader enabled MMU and set the SRAM (given to barebox) as non-secure – This issue can be resolved with adding "dsb sy" command before the "isb"
2. The bootloader enabled MMU and dcache on SRAM (given to barebox) as non-secure – This is a bit harder to solve because it needs to call cache invalidate on the stack.

The bootrom can run in one of the following modes to avoid the bug:
1. Run with MMU disabled. In this case the SRAM is treated as a device and in secure mode.
2. Run with MMU enabled but set the SRAM to secure mode.

Explanation of the first potential bug:
Before the barebox calls arm_cpu_lowlevel_init (from our spider_lowlevel_init function) it pushes to the stack the return address.
At this point the stack is located on SRAM in non-secure mode.
When the function arm_cpu_lowlevel_init disables the MMU, the SRAM area becomes secure and as a result, reading from the stack (by "ret") is done before the write (few cycles before) was done.
The ARM core can do that because it assumes those are not the same locations (at least I think this explains the issue because this is what we saw in RTL verification environment).
Placing a "dsb sy" before the "isb" command solved the issue.

Explanation of the second potential bug:
Same as before but now it is not enough to call "dsb sy" because the data is fetched from a different cache line (there are different cache lines for secure and non-secure modes).
So to fix this issue, the function must call data cache invalidate on the area where the current stack pointer is located for N cache lines (I assume 1 cache line would be enough but I did not tested it).

Few questions:
1. Do you agree with the above analysis?
2. If so should I prepare a patch for it?
3. Should the patch also include a fix to the data cache issue?
4. If you think the code doesn't need to change, do you agree that Barebox documentation needs to include some guidance to bootloaders?
IMHO, the barebox should behave correctly no matter the initial conditions of the bootloader.
As you wrote in your commit:

"ARM: aarch64: lowlevel: Reset SCTLR_EL3 in arm_cpu_lowlevel_init()

There's no guarantee that when arm_cpu_lowlevel_init() runs at EL3,
SCTLR will be in a state we expect it to be. Add code to reset it to a
known state, so we'd always start form clean slate. This is also
matches what we've been doing non 64-bit ARMs.

Real word motivation for this patch is i.MX8MQ whose rev 2.1 silicon
appear to have different mask ROM behaviour where it now leaves MMU
enabled if no valid boot source is found. Page table it sets up
doesn't include DDR range, so trying to bootstrap the device via
JTAG/OpenOCD results in an abort.

The value for SCTLR_ELx_FLAGS was taken from Linux kernel."

Cheers,
Lior.