[RFC please help] membarrier: Rewrite sync_core_before_usermode()

Andy Lutomirski luto at kernel.org
Mon Dec 28 12:14:23 EST 2020

On Mon, Dec 28, 2020 at 2:25 AM Russell King - ARM Linux admin
<linux at armlinux.org.uk> wrote:
> On Sun, Dec 27, 2020 at 01:36:13PM -0800, Andy Lutomirski wrote:
> > On Sun, Dec 27, 2020 at 12:18 PM Mathieu Desnoyers
> > <mathieu.desnoyers at efficios.com> wrote:
> > >
> > > ----- On Dec 27, 2020, at 1:28 PM, Andy Lutomirski luto at kernel.org wrote:
> > >
> >
> > > >
> > > > I admit that I'm rather surprised that the code worked at all on arm64,
> > > > and I'm suspicious that it has never been very well tested.  My apologies
> > > > for not reviewing this more carefully in the first place.
> > >
> > > Please refer to Documentation/features/sched/membarrier-sync-core/arch-support.txt
> > >
> > > It clearly states that only arm, arm64, powerpc and x86 support the membarrier
> > > sync core feature as of now:
> >
> > Sigh, I missed arm (32).  Russell or ARM folks, what's the right
> > incantation to make the CPU notice instruction changes initiated by
> > other cores on 32-bit ARM?
> You need to call flush_icache_range(), since the changes need to be
> flushed from the data cache to the point of unification (of the Harvard
> I and D), and the instruction cache needs to be invalidated so it can
> then see those updated instructions. This will also take care of the
> necessary barriers that the CPU requires for you.

With what parameters?   From looking at the header, this is for the
case in which the kernel writes some memory and then intends to
execute it.  That's not what membarrier() does at all.  membarrier()
works like this:

User thread 1:

write to RWX memory *or* write to an RW alias of an X region.
somehow tell thread 2 that we're ready (with a store release, perhaps,
or even just a relaxed store.)

User thread 2:

wait for the indication from thread 1.
jump to the code.

membarrier() is, for better or for worse, not given a range of addresses.

On x86, the documentation is a bit weak, but a strict reading
indicates that thread 2 must execute a serializing instruction at some
point after thread 1 writes the code and before thread 2 runs it.
membarrier() does this by sending an IPI and ensuring that a
"serializing" instruction (thanks for great terminology, Intel) is
executed.  Note that flush_icache_range() is a no-op on x86, and I've
asked Intel's architects to please clarify their precise rules.  No
response yet.

On arm64, flush_icache_range() seems to send an IPI, and that's not
what I want.  membarrier() already does an IPI.

I suppose one option is to revert support for this membarrier()
operation on arm, but it would be nice to just implement it instead.


More information about the linux-arm-kernel mailing list