[PATCH 18/18] arm64: lto: Strengthen READ_ONCE() to acquire when CLANG_LTO=y

Mon Jul 6 15:23:26 EDT 2020

On Mon, 6 Jul 2020 at 20:35, Will Deacon <will at kernel.org> wrote:
> On Mon, Jul 06, 2020 at 05:00:23PM +0100, Dave Martin wrote:
> > On Thu, Jul 02, 2020 at 08:23:02AM +0100, Will Deacon wrote:
> > > On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote:
> > > > Also, can you illustrate code that can only be unsafe with Clang LTO?
> > >
> > > I don't have a concrete example, but it's an ongoing concern over on the LTO
> > > thread [1], so I cooked this to show one way we could deal with it. The main
> > > concern is that the whole-program optimisations enabled by LTO may allow the
> > > compiler to enumerate possible values for a pointer at link time and replace
> > > an address dependency between two loads with a control dependency instead,
> > > defeating the dependency ordering within the CPU.
> >
> > Why can't that happen without LTO?
>
> It could, but I'd argue that it's considerably less likely because there
> is less information available to the compiler to perform these sorts of
> optimisations. It also doesn't appear to be happening in practice.
>
> The current state of affairs is that, if/when we catch the compiler
> performing harmful optimistations, we look for a way to disable them.
> However, there are good reasons to enable LTO, so this is one way to
> do that without having to worry about the potential impact on dependency
> ordering.

If it's of any help, I'll see if we can implement that warning in LLVM
if data dependencies somehow disappear (although I don't have any
cycles to pursue right now myself). Until then, short of manual
inspection or encountering a bug in the wild, there is no proof any of
this happens or doesn't happen.

Also, as some anecdotal evidence it's extremely unlikely, even with
LTO: looking at the passes that LLVM runs, there are a number of
passes that seem to want to eliminate basic blocks, thereby getting
rid of branches. Intuitively, it makes sense, because branches are
expensive on most architectures (for GPU targets, I think it tries
even harder to get rid of branches). If we extend our reasoning and
assumptions of LTO's aggressiveness in that direction, we might
actually end up with fewer branches. That might be beneficial for the
data dependencies we worry about (but not so much for control
dependencies we want to keep). Still, no point in speculating (no pun
intended) until we have hard data what actually happens. :-)

Thanks,
-- Marco