[RFC PATCH] perf/smmuv3: Fix shared interrupt handling

Fri Jul 3 09:42:14 EDT 2020

On Wed, Jun 24, 2020 at 02:08:30PM +0100, Robin Murphy wrote:
> On 2020-06-24 13:50, Will Deacon wrote:
> > On Wed, Jun 24, 2020 at 12:48:14PM +0100, Robin Murphy wrote:
> > > On 2020-04-08 17:49, Robin Murphy wrote:
> > > > IRQF_SHARED is dangerous, since it allows other agents to retarget the
> > > > IRQ's affinity without migrating PMU contexts to match, breaking the way
> > > > in which perf manages mutual exclusion for accessing events. Although
> > > > this means it's not realistically possible to support PMU IRQs being
> > > > shared with other drivers, we *can* handle sharing between multiple PMU
> > > > instances with some explicit affinity bookkeeping and manual interrupt
> > > > multiplexing.
> > > > 
> > > > RCU helps us handle interrupts efficiently without having to worry about
> > > > fine-grained locking for relatively-theoretical race conditions with the
> > > > probe/remove/CPU hotplug slow paths. The resulting machinery ends up
> > > > looking largely generic, so it should be feasible to factor out with a
> > > > "system PMU" base class for similar multi-instance drivers.
> > > > 
> > > > Signed-off-by: Robin Murphy <robin.murphy at arm.com>
> > > > ---
> > > > 
> > > > RFC because I don't have the means to test it, and if the general
> > > > approach passes muster then I'd want to tackle the aforementioned
> > > > factoring-out before merging anything anyway.
> > > 
> > > Any comments on whether it's worth pursuing this?
> > 
> > Sorry, I don't really get the problem that it's solving. Is there a crash
> > log somewhere I can look at? If all the users of the IRQ are managed by
> > this driver, why is IRQF_SHARED dangerous?
> 
> Because as-is, multiple PMU instances may make different choices about which
> CPU they associate with, change the shared IRQ affinity behind each others'
> backs, and break the "IRQ handler runs on event->cpu" assumption that perf
> core relies on for correctness. I'm not sure how likely it would be to
> actually crash rather than just lead to subtle nastiness, but wither way
> it's not good, and since people seem to be tempted to wire up system PMU
> instances this way we could do with a general approach for dealing with it.

Ok, thanks for the explanation. If we're just talking about multiple
instances of the same driver, why is it not sufficient to have a static
atomic_t initialised to -1 which tracks the current affinity and then just
CAS that during probe()? Hotplug notifiers can just check whether or not
it points to an online CPU

Will