[PATCH] ARM: vfp: use asm volatile for FP control register accesses

Ard Biesheuvel ardb at kernel.org
Wed Mar 27 00:31:04 PDT 2024


On Wed, 27 Mar 2024 at 09:05, Ard Biesheuvel <ardb at kernel.org> wrote:
>
> On Wed, 27 Mar 2024 at 01:55, Nathan Chancellor <nathan at kernel.org> wrote:
> >
> > On Mon, Mar 18, 2024 at 10:30:05AM +0100, Ard Biesheuvel wrote:
> > > From: Ard Biesheuvel <ardb at kernel.org>
> > >
> > > Clang may reorder FP control register reads and writes, due to the fact
> > > that the inline asm() blocks in the read/write wrappers are not volatile
> > > qualified, and the compiler has no idea that these reads and writes may
> > > have side effects.
> > >
> > > In particular, reads of FPSCR may generate an UNDEF exception if a
> > > floating point exception is pending, and the FP emulation code in
> > > VFP_bounce() explicitly clears FP exceptions temporarily in order to be
> > > able to perform the emulation on behalf of user space. This requires
> > > that the writes to FPEXC are never reordered with respect to accesses to
> > > other FP control registers, such as FPSCR.
> > >
> > > So use asm volatile for both the read and the write helpers.
> > >
> > > Cc: <stable at kernel.org>
> > > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> >
> > This seems reasonable to me based on my understanding of GCC's
> > documentation. However, their documentation states "the compiler can
> > move even volatile asm instructions relative to other code, including
> > across jump instructions" and I feel like there was some discussion
> > around this sentence in the past but I can't remember what the
> > conclusion was, although I want to say Clang did not have the same
> > behavior.
>
> The only thing that matters here is whether two asm blocks are emitted
> in a different order than they appear in the program, and I would be
> very surprised if volatile permits that. Otherwise, we might introduce
> a fake input dependency or a memory clobber instead.
>
> > Regardless:
> >
> > Acked-by: Nathan Chancellor <nathan at kernel.org>
> >
>
> Thanks.
>
> > I am just curious, how was this discovered or noticed? Was there a
> > report I missed?
> >
>
> I noticed this when building a recent kernel for the original
> Raspberry Pi, which is ARMv6 not ARMv7, and has a VFP which partially
> relies on emulation. On more recent cores, we never hit the issue
> because emulation is never needed. On older cores, there is no VFP so
> we never reach this code path either.

An alternative approach might be to do the following, which tricks the
compiler into thinking fmxr might update *current, and fmrx might
access it. Given that current == current_thread_info(), which is used
all the time in the VFP code, it will already be available in a
register, and so it doesn't require the compiler to generate any
additional code.



--- a/arch/arm/vfp/vfpinstr.h
+++ b/arch/arm/vfp/vfpinstr.h
@@ -68,14 +68,14 @@
        u32 __v;                        \
        asm(".fpu       vfpv2\n"        \
            "vmrs       %0, " #_vfp_    \
-           : "=r" (__v) : : "cc");     \
+           : "=r" (__v) : "Q" (*current)); \
        __v;                            \
  })

 #define fmxr(_vfp_,_var_)              \
        asm(".fpu       vfpv2\n"        \
-           "vmsr       " #_vfp_ ", %0" \
-          : : "r" (_var_) : "cc")
+           "vmsr       " #_vfp_ ", %1" \
+          : "=Q" (*current) : "r" (_var_))

 #else



More information about the linux-arm-kernel mailing list