[BUG] ARM64 regression: NULL pointer dereference in arm_smccc_version_init+0x90/0x1ac
Mark Rutland
mark.rutland at arm.com
Fri Jan 31 05:54:57 PST 2025
On Fri, Jan 31, 2025 at 12:41:04PM +0000, Will Deacon wrote:
> On Thu, Jan 30, 2025 at 03:56:10PM +0100, Emanuele Rocca wrote:
> > On 2025-01-30 12:19, Mark Rutland wrote:
> > > For the benefit others, when we looked into this a few days ago it
> > > appeared that a GPR was being clobbered across an SMCCC call, resulting
> > > in a later crash (as that GPR should hold the ADRP'd base address of
> > > 'smccc_version'). I didn't have the time to dig more into that (e.g. to
> > > figure out whether kernel/compiler/firmware was to blame).
> > >
> > > Emanuele, could you please dump the result of:
> > >
> > > objdump --disassemble=arm_smccc_version_init vmlinux
> > >
> > > ... for this kernel? That'd make it possible for others to
> > > perform/verify the analysis I mentioned above.
> >
> > Sure, here it is:
> > https://people.debian.org/~ema/Z5OpGluX6oX5NLxh@NH27D9T0LF.objdump.txt
> >
> > > If you can share any more details from the crash, that'd be helpful. The
> > > GPR dump would be *enormously* helpful in this case, and even a photo of
> > > the crash log might be useful.
> >
> > pc : arm_smccc_version_init+0x90/0x1ac
> > lr : psci_probe+0x1fc/0x2c0
> > sp : ffffa3d8eea33ca0
> > x29: ffffa3d8eea33ce0 x28: 0000000081000200 x27: ffffa3d8edd72a80
> > x26: ffffa3d8edd728b0 x25: ffffa3d8eec66248 x24: ffffa3d8edc528d8
> > x23: ffffa3d8eed7f4f8 x22: 0000000000000001 x21: ffffa3d8eea58410
> > x20: ffffa3d8eed7f000 x19: 0000000000010001 x18: 0000000000000006
> > x17: 0000000009f69730 x16: 00000008760968d0 x15: 0000000000000000
> > x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> > x11: 00000000ffffefff x10: ffffa3d8eeac09c8 x9 : ffffa3d8eea687d8
> > x8 : ffffa3d8eea33cb8 x7 : 0000000000000000 x6 : 0000000000000000
> > x5 : ffffa3d8eed7f000 x4 : ffffa3d8edeb0000 x3 : 0000000000000000
> > x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000000
> >
> > Call trace:
> > arm_smccc_version_init+0x90/0x1ac
> > psci_0_2_init+0x20/0x2c
> > psci_1_0_init+0x1c/0x58
> > psci_dt_init+0x6c/0x98
> > setup_arch+0x400/0x5ec
> > start_kernel+0x90/0x788
> > __primary_switched+0xbc/0xc4
> > Code: d29fffe1 eb01001f 1a9f97e0 f0ffd384 (b94264c2)
> >
> > The above was transcribed by hand. I double-checked, but there may be
> > mistakes.
> >
> > Photo here: https://people.debian.org/~ema/Z5OpGluX6oX5NLxh@NH27D9T0LF.jpg
>
> Crikey, you transcribed all that junk by _hand_?! Thanks for the effort,
> but next time please don't waste your time! The screenshot is fine.
>
> Anyway, it looks like x6 gets trashed across the SMC. I think that's fine
> per the SMCCC spec:
>
> | Unused result and scratch registers can leak information after an
> | SMC or HVC call. An implementation can mitigate this risk by either
> | preserving the register state over the call, or returning a constant
> | value,such as zero, in each register.
I think that's misleading/stale and x6 specifically is not supposed to
be clobbered. Shortly before that there's wording about preserving the
registers, summary/history below:
* SMCCC v1.0 allowed x0 to x17 to be clobbered over a call; all other
GPRs had to be preserved.
That was defined in ARM DEN 0028B, AKA "SMC CALLING CONVENTION":
https://developer.arm.com/documentation/den0028/b/?lang=en
* SMCCC v1.1 mandated that x4 to x17 were preserved. This was critical
for some of the ARCH_WORKAROUND_* calls made from kernel/kvm entry
asm.
That was defined in ARM DEN 0070, AKA "Firmware interfaces for
mitigating cache speculation vulnerabilities":
https://developer.arm.com/cache-speculation-vulnerability-firmware-specification
* SMCCC v1.2 relaxed things such that x4 to x17 must be preserved
*unless* they contained results of the function.
That was defined in ARM DEN 0028C, AKA "SMC CALLING CONVENTION":
https://developer.arm.com/documentation/den0028/c/?lang=en
.. which folded inthe requirements from ARM DEN 0070, with the
relaxation above, but also kept the stale not from ARM DEN 0028B.
In Linux we depend on x4 to x17 not being clobbered *unless* they're
being used to return a result.
The current wording in the spec is messed up, and suggests that SMC32
calls could clobber the upper 32 bits of x4 to x17, which is inccorect
and would break the ARCH_WORKAROUND_* calls mentioend above (given those
are SMC32). I'll go chase the SMCCC architects about that; it's clearly
an unintentional error when reformatting the document.
The *intent* is that x4 to x17 are preserved here. AFAICT, nothing in
arm_smccc_version_init() returns a value in x4 to x17. If any of those
are clobbered for an SMCCCv1.1 call, it's definitely a firmware bug.
> so the question is whether or not the compiler should be relying on the
> register being preserved. It looks to me like the arm_smccc_1_1_smc()
> macro is actually broken in this regard: it constructs the input list
> based on the number of arguments passed and doesn't take the above into
> account at all.
As above, I think that arm_smccc_1_1_smc() is correct; it *always*
clobbers x0 to x3, and cannot be used for SMCCCv1.2+ calls that return
values in x4 to x17.
That doesn't change the fact that we'll need a workaround, though.
> So I think we should change that so unused arguments are clobbered
> instead. Mark -- what do you think?
As above, if FW gets this wrong for calls made from entry asm, the
results could be catastrophic, so it'd be good if we could detect this
and log a FW BUG the first time we hit it.
We'd have to assume that any of x0 to x17 could be clobbered, and at
that point I suspect it'd be better to move this out-of-line for almost
all callers, which we effectively already have in the form of the
arm_smccc_1_2_{smc,hvc}() trampolines. We could bounce arm_smccc_1_1_*()
calls through those.
Mark.
More information about the linux-arm-kernel
mailing list