[PATCH] ARM: vfp: Use asm volatile in fmrx/fmxr macros

Calvin Owens calvin at wbinvd.org
Sun Jul 28 14:29:44 PDT 2024


On Sunday 07/28 at 19:09 +0200, Ard Biesheuvel wrote:
> (cc Arnd, Nathan, zhuqiuer)
> 
> On Sun, 28 Jul 2024 at 10:21, Calvin Owens <calvin at wbinvd.org> wrote:
> > <snip>
> >
> > Crudely grepping for vmsr/vmrs instructions in the otherwise nearly
> > idential text for vfp_support_entry() makes the problem obvious:
> >
> >     vmlinux.llvm.good [0xc0101cb8] <+48>:  vmrs   r7, fpexc
> >     vmlinux.llvm.good [0xc0101cd8] <+80>:  vmsr   fpexc, r0
> >     vmlinux.llvm.good [0xc0101d20] <+152>: vmsr   fpexc, r7
> >     vmlinux.llvm.good [0xc0101d38] <+176>: vmrs   r4, fpexc
> >     vmlinux.llvm.good [0xc0101d6c] <+228>: vmrs   r0, fpscr
> >     vmlinux.llvm.good [0xc0101dc4] <+316>: vmsr   fpexc, r0
> >     vmlinux.llvm.good [0xc0101dc8] <+320>: vmrs   r0, fpsid
> >     vmlinux.llvm.good [0xc0101dcc] <+324>: vmrs   r6, fpscr
> >     vmlinux.llvm.good [0xc0101e10] <+392>: vmrs   r10, fpinst
> >     vmlinux.llvm.good [0xc0101eb8] <+560>: vmrs   r10, fpinst2
> >
> >     vmlinux.llvm.bad  [0xc0101cb8] <+48>:  vmrs   r7, fpexc
> >     vmlinux.llvm.bad  [0xc0101cd8] <+80>:  vmsr   fpexc, r0
> >     vmlinux.llvm.bad  [0xc0101d20] <+152>: vmsr   fpexc, r7
> >     vmlinux.llvm.bad  [0xc0101d30] <+168>: vmrs   r0, fpscr
> >     vmlinux.llvm.bad  [0xc0101d50] <+200>: vmrs   r6, fpscr  <== BOOM!
> >     vmlinux.llvm.bad  [0xc0101d6c] <+228>: vmsr   fpexc, r0
> >     vmlinux.llvm.bad  [0xc0101d70] <+232>: vmrs   r0, fpsid
> >     vmlinux.llvm.bad  [0xc0101da4] <+284>: vmrs   r10, fpinst
> >     vmlinux.llvm.bad  [0xc0101df8] <+368>: vmrs   r4, fpexc
> >     vmlinux.llvm.bad  [0xc0101e5c] <+468>: vmrs   r10, fpinst2
> >
> > I think LLVM's reordering is valid as the code is currently written: the
> > compiler doesn't know the instructions have side effects in hardware.
> >
> > Fix by using "asm volatile" in fmxr() and fmrx(), so they cannot be
> > reordered with respect to each other. The original compiler now produces
> > working kernels on my hardware with DYNAMIC_DEBUG=n.
> >
> > This is the relevant piece of the diff of the vfp_support_entry() text,
> > from the original oopsing kernel to a working kernel with this patch:
> >
> >          vmrs r0, fpscr
> >          tst r0, #4096
> >          bne 0xc0101d48
> >          tst r0, #458752
> >          beq 0xc0101ecc
> >          orr r7, r7, #536870912
> >          ldr r0, [r4, #0x3c]
> >          mov r9, #16
> >         -vmrs r6, fpscr
> >          orr r9, r9, #251658240
> >          add r0, r0, #4
> >          str r0, [r4, #0x3c]
> >          mvn r0, #159
> >          sub r0, r0, #-1207959552
> >          and r0, r7, r0
> >          vmsr fpexc, r0
> >          vmrs r0, fpsid
> >         +vmrs r6, fpscr
> >          and r0, r0, #983040
> >          cmp r0, #65536
> >          bne 0xc0101d88
> >
> > Fixes: 4708fb041346 ("ARM: vfp: Reimplement VFP exception entry in C code")
> > Signed-off-by: Calvin Owens <calvin at wbinvd.org>
> > ---
> >  arch/arm/vfp/vfpinstr.h | 48 ++++++++++++++++++++++-------------------
> >  1 file changed, 26 insertions(+), 22 deletions(-)
> >
> 
> Thanks for the patch, and for the excellent analysis.
> 
> Note that this fix has been proposed in the past, as well as another
> one addressing the same issue, but we've been incredibly sloppy
> getting it merged.
> 
> https://lore.kernel.org/linux-arm-kernel/20240410024126.21589-2-zhuqiuer1@huawei.com/
> https://lore.kernel.org/linux-arm-kernel/20240318093004.117153-2-ardb+git@google.com/

Ah sorry for missing that, I searched for the symptom not the cure.

> What both of us appear to have missed is that there are two versions
> of these routines, which should either be dropped (as they are
> obsolete now that the minimum binutils version is 2.25) or fixed up as
> well, as you do below.
> 
> Anyone have any thoughts on using a memory clobber as opposed to
> volatile? I think volatile means that the access cannot be elided, but
> it is unclear to me whether that implies any ordering. A 'memory'
> clobber implies that globally visible state is updated, which seems
> like a stronger guarantee to me.

My thinking was that if 'asm volatile' is sufficient, it will suppress
less optimization than the clobber, since the clobber might force the
compiler to assume unrelated memory could have been modified when it
really never is. But I'm not sure about that.

Out of curiousity, I tried it both ways with the same compiler just now,
the only tiny difference in the emitted vfp_support_entry() is here:

	--- /volatile	2024-07-28 13:28:59.890505404 -0700
	+++ /memclobber	2024-07-28 13:28:59.890505404 -0700
	 str r0, [r5, #0x4]
	 vmrs r7, fpexc
	 tst r7, #1073741824
	 bne 0xc0101d28
	 orr r7, r7, #1073741824
	 bic r0, r7, #-2147483648
	 vmsr fpexc, r0
	+ldr r8, [pc, #0x26c]
	 ldr r0, [r5, #0x8]
	-ldr r8, [pc, #0x268]
	 add r6, r5, #224
	 ldr r0, [r8, r0, lsl #2]
	 cmp r0, r6
	 beq 0xc0101d18

Thanks,
Calvin

> In any case, let's work together to get /some/ version of this fix merged asap.
> 
> > diff --git a/arch/arm/vfp/vfpinstr.h b/arch/arm/vfp/vfpinstr.h
> > index 3c7938fd40aa..32090b0fb250 100644
> > --- a/arch/arm/vfp/vfpinstr.h
> > +++ b/arch/arm/vfp/vfpinstr.h
> > @@ -64,33 +64,37 @@
> >
> >  #ifdef CONFIG_AS_VFP_VMRS_FPINST
> >
> > -#define fmrx(_vfp_) ({                 \
> > -       u32 __v;                        \
> > -       asm(".fpu       vfpv2\n"        \
> > -           "vmrs       %0, " #_vfp_    \
> > -           : "=r" (__v) : : "cc");     \
> > -       __v;                            \
> > - })
> > -
> > -#define fmxr(_vfp_,_var_)              \
> > -       asm(".fpu       vfpv2\n"        \
> > -           "vmsr       " #_vfp_ ", %0" \
> > -          : : "r" (_var_) : "cc")
> > +#define fmrx(_vfp_) ({                         \
> > +       u32 __v;                                \
> > +       asm volatile (".fpu     vfpv2\n"        \
> > +                     "vmrs     %0, " #_vfp_    \
> > +                    : "=r" (__v) : : "cc");    \
> > +       __v;                                    \
> > +})
> > +
> > +#define fmxr(_vfp_, _var_) ({                  \
> > +       asm volatile (".fpu     vfpv2\n"        \
> > +                     "vmsr     " #_vfp_ ", %0" \
> > +                    : : "r" (_var_) : "cc");   \
> > +})
> >
> >  #else
> >
> >  #define vfpreg(_vfp_) #_vfp_
> >
> > -#define fmrx(_vfp_) ({                 \
> > -       u32 __v;                        \
> > -       asm("mrc p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmrx   %0, " #_vfp_    \
> > -           : "=r" (__v) : : "cc");     \
> > -       __v;                            \
> > - })
> > -
> > -#define fmxr(_vfp_,_var_)              \
> > -       asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr   " #_vfp_ ", %0" \
> > -          : : "r" (_var_) : "cc")
> > +#define fmrx(_vfp_) ({                                         \
> > +       u32 __v;                                                \
> > +       asm volatile ("mrc p10, 7, %0, " vfpreg(_vfp_) ","      \
> > +                     "cr0, 0 @ fmrx    %0, " #_vfp_            \
> > +                    : "=r" (__v) : : "cc");                    \
> > +       __v;                                                    \
> > +})
> > +
> > +#define fmxr(_vfp_, _var_) ({                                  \
> > +       asm volatile ("mcr p10, 7, %0, " vfpreg(_vfp_) ","      \
> > +                     "cr0, 0 @ fmxr    " #_vfp_ ", %0"         \
> > +                    : : "r" (_var_) : "cc");                   \
> > +})
> >
> >  #endif
> >
> > --
> > 2.39.2
> >



More information about the linux-arm-kernel mailing list