[PATCH 18/18] arm64: lto: Strengthen READ_ONCE() to acquire when CLANG_LTO=y

Mon Jul 6 15:42:37 EDT 2020

On Mon, Jul 06, 2020 at 09:23:26PM +0200, Marco Elver wrote:
> On Mon, 6 Jul 2020 at 20:35, Will Deacon <will at kernel.org> wrote:
> > On Mon, Jul 06, 2020 at 05:00:23PM +0100, Dave Martin wrote:
> > > On Thu, Jul 02, 2020 at 08:23:02AM +0100, Will Deacon wrote:
> > > > On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote:
> > > > > Also, can you illustrate code that can only be unsafe with Clang LTO?
> > > >
> > > > I don't have a concrete example, but it's an ongoing concern over on the LTO
> > > > thread [1], so I cooked this to show one way we could deal with it. The main
> > > > concern is that the whole-program optimisations enabled by LTO may allow the
> > > > compiler to enumerate possible values for a pointer at link time and replace
> > > > an address dependency between two loads with a control dependency instead,
> > > > defeating the dependency ordering within the CPU.
> > >
> > > Why can't that happen without LTO?
> >
> > It could, but I'd argue that it's considerably less likely because there
> > is less information available to the compiler to perform these sorts of
> > optimisations. It also doesn't appear to be happening in practice.
> >
> > The current state of affairs is that, if/when we catch the compiler
> > performing harmful optimistations, we look for a way to disable them.
> > However, there are good reasons to enable LTO, so this is one way to
> > do that without having to worry about the potential impact on dependency
> > ordering.
> 
> If it's of any help, I'll see if we can implement that warning in LLVM
> if data dependencies somehow disappear (although I don't have any
> cycles to pursue right now myself). Until then, short of manual
> inspection or encountering a bug in the wild, there is no proof any of
> this happens or doesn't happen.
> 
> Also, as some anecdotal evidence it's extremely unlikely, even with
> LTO: looking at the passes that LLVM runs, there are a number of
> passes that seem to want to eliminate basic blocks, thereby getting
> rid of branches. Intuitively, it makes sense, because branches are
> expensive on most architectures (for GPU targets, I think it tries
> even harder to get rid of branches). If we extend our reasoning and
> assumptions of LTO's aggressiveness in that direction, we might
> actually end up with fewer branches. That might be beneficial for the
> data dependencies we worry about (but not so much for control
> dependencies we want to keep). Still, no point in speculating (no pun
> intended) until we have hard data what actually happens. :-)

Anything along these lines would be very welcome!!!

							Thanx, Paul