[PATCH 7/8] membarrier: Remove arm (32) support for SYNC_CORE

Peter Zijlstra peterz at infradead.org
Thu Jun 17 07:05:03 PDT 2021


On Thu, Jun 17, 2021 at 06:41:41AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 17, 2021, at 4:33 AM, Mark Rutland wrote:

> > Sure, and I agree we should not change cacheflush().
> > 
> > The point of membarrier(SYNC_CORE) is that you can move the cost of that
> > ISB out of the fast-path in the executing thread(s) and into the
> > slow-path on the thread which generated the code.
> > 
> > So e.g. rather than an executing thread always having to do:
> > 
> > 	LDR	<reg>, [<funcptr>]
> > 	ISB	// in case funcptr was just updated
> > 	BLR	<reg>
> > 
> > ... you have the thread generating the code use membarrier(SYNC_CORE)
> > prior to plublishing the funcptr, and the fast-path on all the executing
> > threads can be:
> > 
> > 	LDR	<reg> [<funcptr>]
> > 	BLR	<reg>
> > 
> > ... and thus I think we still want membarrier(SYNC_CORE) so that people
> > can do this, even if there are other means to achieve the same
> > functionality.
> 
> I had the impression that sys_cacheflush() did that.  Am I wrong?

Yes, sys_cacheflush() only does what it says on the tin (and only
correctly for hardware broadcast -- everything except 11mpcore).

It only invalidates the caches, but not the per CPU derived state like
prefetch buffers and micro-op buffers, and certainly not instructions
already in flight.

So anything OoO needs at the very least a complete pipeline stall
injected, but probably something stronger to make it flush the buffers.

> In any event, I’m even more convinced that no new SYNC_CORE arches
> should be added. We need a new API that just does the right thing. 

I really don't understand why you hate the thing so much; SYNC_CORE is a
means of injecting whatever instruction is required to flush all uarch
state related to instructions on all theads (not all CPUs) of a process
as efficient as possible.

The alternative is sending signals to all threads (including the
non-running ones) which is known to scale very poorly indeed, or, as
Mark suggests above, have very expensive instructions unconditinoally in
the instruction stream, which is also undesired.




More information about the linux-arm-kernel mailing list