[PATCH] irqchip/sifive-plic: Fix plic_set_affinity() only enables 1 cpu
Nam Cao
namcao at linutronix.de
Mon Jul 15 12:49:43 PDT 2024
On Wed, Jul 10, 2024 at 10:19:54PM +0200, Thomas Gleixner wrote:
> On Sun, Jul 07 2024 at 16:34, Nam Cao wrote:
> > On Wed, Jul 03, 2024 at 08:31:31PM +0530, Anup Patel wrote:
> >> On Wed, Jul 3, 2024 at 6:03 PM Nam Cao <namcao at linutronix.de> wrote:
> >> > Then we should leave the job of distributing interrupts to those tools,
> >> > right? Not all use cases want minimally wasted CPU cycles. For example, if
> >> > a particular interrupt does not arrive very often, but when it does, it
> >> > needs to be handled fast; in this example, clearly enabling this interrupt
> >> > for all CPUs is superior.
>
> It's not really superior at all.
>
> The problem is that a single interrupt is delivered to multiple CPUs at
> once and there is no mechanism in the hardware to select one CPU from
> the given set which can handle it quickly because it does not have
> interrupts disabled. The spec is clear about this:
>
> "The PLIC hardware only supports multicasting of interrupts, such that
> all enabled targets will receive interrupt notifications for a given
> active interrupt. Multicasting provides rapid response since the
> fastest responder claims the interrupt, but can be wasteful in
> high-interrupt-rate scenarios if multiple harts take a trap for an
> interrupt that only one can successfully claim."
>
> It's even worse. $N CPUs will in the worst case congest on the interrupt
> descriptor lock and for edge type interrupts it will cause the interrupt
> to be masked, marked pending and the handling CPU is forced to unmask
> and run another handler invocation. That's just wrong.
Hmm I'm not sure if this case can happen. CPUs do not touch the interrupt,
including taking the description lock, unless it has claimed the interrupt
successfully; and only 1 CPU can claim successfully.
But it doesn't matter, your other points are enough for me to drop this
patch.
> Aside of that this can cause the loss of cache and memory locality in
> high speed scenarios when the interrupt handler bounces around between
> CPUs.
>
> >> This is a very specific case which you are trying to optimize and in the
> >> process hurting performance in many other cases. There are many high
> >> speed IOs (network, storage, etc) where rate of interrupt is high so for
> >> such IO your patch will degrade performance on multiple CPUs.
> >
> > No, it wouldn't "hurting performance in many other cases". It would give
> > users the ability to do what they want, including hurting performance as
> > you said, or improving performance as I pointed out earlier.
>
> Kinda, but you make the bad case the default for very dubious benefits.
>
> I grant you that the current implementation which defaults everything to
> CPU0 is suboptimal, but that's an orthogonal problem. E.g. X86 selects
> the single target CPU from the mask by spreading the interrupts out
> halfways evenly.
>
> But if you really care about low latency, then you want to selectively
> associate interrupts to particular CPUs and ensure that the associated
> processing is bound to the same CPU.
>
> > I am willing to bet that most users don't ever touch this. But if they do,
> > they better know what they are doing. If they want to waste their CPU
> > cycles, so be it.
>
> That's not what your patch does. It defaults to bad behaviour.
>
> > My point essentially is that kernel shouldn't force any policy on users.
> > The only case this makes sense is when the policy is _strictly_ better than
> > anything else, which is not true here. What the driver should do is
> > providing a "good enough for most" default, but still let users decide
> > what's best for them.
>
> See what I explained you above. Changing this to multi-CPU delivery is
> not really good enough and there is a valid technical reason not to do
> that.
>
> > Side note: if I am not mistaken, the effective affinity mask thing is for
> > hardware limitation of the chips who cannot enable interrupt for all CPUs
> > in the mask. RISC-V PLIC, on the other hand, can enable interrupts for any
> > CPU, and therefore should do so.
>
> It's not only hardware limitations which cause architectures to limit
> the CPU mask to a single target. On X86 systems which support logical
> destination or cluster mode this was disabled even though the 'multiple
> CPUs try to handle it' problem is mitigated in hardware. The benefit is
> marginal for the common case and is not sufficient for the low latency
> requirements case.
>
> Using a spreading algorithm for the default case will help for the
> common case, but won't be sufficient for special latency sensitive
> scenarios. Those are the scenarios where the user needs to take care and
> set the affinities correctly.
Thanks for the explanation, I didn't see all the angles on this. Let's drop
it then.
Best regards,
Nam
More information about the linux-riscv
mailing list