[PATCH] ARM: vfp: Use asm volatile in fmrx/fmxr macros

Ard Biesheuvel ardb at kernel.org
Mon Jul 29 00:12:01 PDT 2024


On Sun, 28 Jul 2024 at 23:29, Calvin Owens <calvin at wbinvd.org> wrote:
>
> On Sunday 07/28 at 19:09 +0200, Ard Biesheuvel wrote:
> > (cc Arnd, Nathan, zhuqiuer)
> >
> > On Sun, 28 Jul 2024 at 10:21, Calvin Owens <calvin at wbinvd.org> wrote:
> > > <snip>
> > >
> > > Crudely grepping for vmsr/vmrs instructions in the otherwise nearly
> > > idential text for vfp_support_entry() makes the problem obvious:
> > >
> > >     vmlinux.llvm.good [0xc0101cb8] <+48>:  vmrs   r7, fpexc
> > >     vmlinux.llvm.good [0xc0101cd8] <+80>:  vmsr   fpexc, r0
> > >     vmlinux.llvm.good [0xc0101d20] <+152>: vmsr   fpexc, r7
> > >     vmlinux.llvm.good [0xc0101d38] <+176>: vmrs   r4, fpexc
> > >     vmlinux.llvm.good [0xc0101d6c] <+228>: vmrs   r0, fpscr
> > >     vmlinux.llvm.good [0xc0101dc4] <+316>: vmsr   fpexc, r0
> > >     vmlinux.llvm.good [0xc0101dc8] <+320>: vmrs   r0, fpsid
> > >     vmlinux.llvm.good [0xc0101dcc] <+324>: vmrs   r6, fpscr
> > >     vmlinux.llvm.good [0xc0101e10] <+392>: vmrs   r10, fpinst
> > >     vmlinux.llvm.good [0xc0101eb8] <+560>: vmrs   r10, fpinst2
> > >
> > >     vmlinux.llvm.bad  [0xc0101cb8] <+48>:  vmrs   r7, fpexc
> > >     vmlinux.llvm.bad  [0xc0101cd8] <+80>:  vmsr   fpexc, r0
> > >     vmlinux.llvm.bad  [0xc0101d20] <+152>: vmsr   fpexc, r7
> > >     vmlinux.llvm.bad  [0xc0101d30] <+168>: vmrs   r0, fpscr
> > >     vmlinux.llvm.bad  [0xc0101d50] <+200>: vmrs   r6, fpscr  <== BOOM!
> > >     vmlinux.llvm.bad  [0xc0101d6c] <+228>: vmsr   fpexc, r0
> > >     vmlinux.llvm.bad  [0xc0101d70] <+232>: vmrs   r0, fpsid
> > >     vmlinux.llvm.bad  [0xc0101da4] <+284>: vmrs   r10, fpinst
> > >     vmlinux.llvm.bad  [0xc0101df8] <+368>: vmrs   r4, fpexc
> > >     vmlinux.llvm.bad  [0xc0101e5c] <+468>: vmrs   r10, fpinst2
> > >
> > > I think LLVM's reordering is valid as the code is currently written: the
> > > compiler doesn't know the instructions have side effects in hardware.
> > >
> > > Fix by using "asm volatile" in fmxr() and fmrx(), so they cannot be
> > > reordered with respect to each other. The original compiler now produces
> > > working kernels on my hardware with DYNAMIC_DEBUG=n.
> > >
> > > This is the relevant piece of the diff of the vfp_support_entry() text,
> > > from the original oopsing kernel to a working kernel with this patch:
> > >
> > >          vmrs r0, fpscr
> > >          tst r0, #4096
> > >          bne 0xc0101d48
> > >          tst r0, #458752
> > >          beq 0xc0101ecc
> > >          orr r7, r7, #536870912
> > >          ldr r0, [r4, #0x3c]
> > >          mov r9, #16
> > >         -vmrs r6, fpscr
> > >          orr r9, r9, #251658240
> > >          add r0, r0, #4
> > >          str r0, [r4, #0x3c]
> > >          mvn r0, #159
> > >          sub r0, r0, #-1207959552
> > >          and r0, r7, r0
> > >          vmsr fpexc, r0
> > >          vmrs r0, fpsid
> > >         +vmrs r6, fpscr
> > >          and r0, r0, #983040
> > >          cmp r0, #65536
> > >          bne 0xc0101d88
> > >
> > > Fixes: 4708fb041346 ("ARM: vfp: Reimplement VFP exception entry in C code")
> > > Signed-off-by: Calvin Owens <calvin at wbinvd.org>
> > > ---
> > >  arch/arm/vfp/vfpinstr.h | 48 ++++++++++++++++++++++-------------------
> > >  1 file changed, 26 insertions(+), 22 deletions(-)
> > >
> >
> > Thanks for the patch, and for the excellent analysis.
> >
> > Note that this fix has been proposed in the past, as well as another
> > one addressing the same issue, but we've been incredibly sloppy
> > getting it merged.
> >
> > https://lore.kernel.org/linux-arm-kernel/20240410024126.21589-2-zhuqiuer1@huawei.com/
> > https://lore.kernel.org/linux-arm-kernel/20240318093004.117153-2-ardb+git@google.com/
>
> Ah sorry for missing that, I searched for the symptom not the cure.
>
> > What both of us appear to have missed is that there are two versions
> > of these routines, which should either be dropped (as they are
> > obsolete now that the minimum binutils version is 2.25) or fixed up as
> > well, as you do below.
> >
> > Anyone have any thoughts on using a memory clobber as opposed to
> > volatile? I think volatile means that the access cannot be elided, but
> > it is unclear to me whether that implies any ordering. A 'memory'
> > clobber implies that globally visible state is updated, which seems
> > like a stronger guarantee to me.
>
> My thinking was that if 'asm volatile' is sufficient, it will suppress
> less optimization than the clobber, since the clobber might force the
> compiler to assume unrelated memory could have been modified when it
> really never is. But I'm not sure about that.
>
> Out of curiousity, I tried it both ways with the same compiler just now,
> the only tiny difference in the emitted vfp_support_entry() is here:
>
>         --- /volatile   2024-07-28 13:28:59.890505404 -0700
>         +++ /memclobber 2024-07-28 13:28:59.890505404 -0700
>          str r0, [r5, #0x4]
>          vmrs r7, fpexc
>          tst r7, #1073741824
>          bne 0xc0101d28
>          orr r7, r7, #1073741824
>          bic r0, r7, #-2147483648
>          vmsr fpexc, r0
>         +ldr r8, [pc, #0x26c]
>          ldr r0, [r5, #0x8]
>         -ldr r8, [pc, #0x268]
>          add r6, r5, #224
>          ldr r0, [r8, r0, lsl #2]
>          cmp r0, r6
>          beq 0xc0101d18
>

Right. So it doesn't matter in practice - good to know.

So in the 'volatile' case, I guess we are relying on the compiler not
reordering those with respect to each other, which should be
sufficient to ensure that we do not access VFP status register before
enabling the VFP unit via the control register, as all are accessed
using inline asm.

In any case, this should go into the patch system.
https://www.armlinux.org.uk/developer/patches/section.php?section=0

Note to self and other: follow-up with a patch that removes
CONFIG_AS_VFP_VMRS_FPINST, which is no longer needed.



More information about the linux-arm-kernel mailing list