[PATCH v11 07/14] irqchip: Add RISC-V incoming MSI controller early driver

Wed Oct 25 09:05:08 PDT 2023

Hi!

Anup Patel <apatel at ventanamicro.com> writes:

> On Tue, Oct 24, 2023 at 6:35 PM Björn Töpel <bjorn at kernel.org> wrote:
>>
>> Hi Anup!
>>
>> Wow, I'm really happy to see that you're moving towards the 1-1 model!
>>
>> Anup Patel <apatel at ventanamicro.com> writes:
>>
>> > The RISC-V advanced interrupt architecture (AIA) specification
>> > defines a new MSI controller called incoming message signalled
>> > interrupt controller (IMSIC) which manages MSI on per-HART (or
>> > per-CPU) basis. It also supports IPIs as software injected MSIs.
>> > (For more details refer https://github.com/riscv/riscv-aia)
>> >
>> > Let us add an early irqchip driver for RISC-V IMSIC which sets
>> > up the IMSIC state and provide IPIs.
>>
>> It would help (reviewers, and future bugfixers) if you add (here or in
>> the cover) what design decisions you've taken instead of just saying
>> that you're now supporting IMSIC.
>
> I agree with the suggestion but this kind of information should be
> in the source itself rather than commit description.

I think the high-level flow, and why you made certain design decisions
should be in the commit message.

The "how" in the code, the "why" in the commit message. Regardless -- it
would make it easier for reviewers to get into your code faster.

[...]

>> > +
>> > +     writel(IMSIC_IPI_ID, local->msi_va);
>>
>> Do you need the barriers here? If so, please document. If not, use the
>> _releaxed() version.
>
> We can't assume that _relaxed version of MMIO operations
> will work for RISC-V implementation so we conservatively
> use regular MMIO operations without _releaxed().

You'll need to expand on your thinking here, Anup. We can't just
sprinkle fences everywhere because of "we can't assume it'll work". Do
you need proper barriers for IPIs or not?

[...]

>> > +             mvec = lpriv->ids_move[i];
>> > +             lpriv->ids_move[i] = NULL;
>> > +             if (mvec) {
>> > +                     if (__imsic_id_read_clear_pending(i)) {
>> > +                             mlocal = per_cpu_ptr(imsic->global.local,
>> > +                                                  mvec->cpu);
>> > +                             writel(mvec->local_id, mlocal->msi_va);
>>
>> Again, do you need all the barriers? If yes, document. No, then relax
>> the call.
>
> Same comment as above.

Dito for me! ;-)

>> > +                     }
>> > +
>> > +                     lpriv->vectors[i].hwirq = UINT_MAX;
>> > +                     lpriv->vectors[i].order = UINT_MAX;
>> > +                     clear_bit(i, lpriv->ids_used_bitmap);
>> > +             }
>> > +
>> > +     }
>> > +     raw_spin_unlock_irqrestore(&lpriv->ids_lock, flags);
>> > +}
>> > +
>> > +void imsic_local_delivery(bool enable)
>> > +{
>> > +     if (enable) {
>> > +             imsic_csr_write(IMSIC_EITHRESHOLD, IMSIC_ENABLE_EITHRESHOLD);
>> > +             imsic_csr_write(IMSIC_EIDELIVERY, IMSIC_ENABLE_EIDELIVERY);
>> > +     } else {
>> > +             imsic_csr_write(IMSIC_EIDELIVERY, IMSIC_DISABLE_EIDELIVERY);
>> > +             imsic_csr_write(IMSIC_EITHRESHOLD, IMSIC_DISABLE_EITHRESHOLD);
>> > +     }
>>
>> My regular "early exit" nit. I guess I really dislike indentation. ;-)
>
> -ENOPARSE

if (...) {
  a();
  b();
  c();
} else {
  d();
  e();
}

vs

if (...) {
  a();
  b();
  c();
  return;
}

d();
e();

[...]

>> > +void imsic_vector_mask(struct imsic_vector *vec)
>> > +{
>> > +     struct imsic_local_priv *lpriv;
>> > +     unsigned long flags;
>> > +
>> > +     lpriv = per_cpu_ptr(imsic->lpriv, vec->cpu);
>> > +     if (WARN_ON(&lpriv->vectors[vec->local_id] != vec))
>> > +             return;
>> > +
>> > +     raw_spin_lock_irqsave(&lpriv->ids_lock, flags);
>> > +     bitmap_clear(lpriv->ids_enabled_bitmap, vec->local_id, 1);
>> > +     raw_spin_unlock_irqrestore(&lpriv->ids_lock, flags);
>> > +
>> > +     imsic_remote_sync(vec->cpu);
>>
>> x86 seems to set a timer instead, for the remote cpu cleanup, which can
>> be much cheaper, and less in instrusive. Is that applicable here?
>
> The issue with that approach is deciding the right duration
> of timer interrupt. There might be platforms who need
> immediate mask/unmask response. We can certainely
> keep improving/tuning this over-time.

Any concrete examples where this is an actual problem?

[...]

>> > +void imsic_vector_move(struct imsic_vector *old_vec,
>> > +                     struct imsic_vector *new_vec)
>> > +{
>> > +     struct imsic_local_priv *old_lpriv, *new_lpriv;
>> > +     struct imsic_vector *ovec, *nvec;
>> > +     unsigned long flags, flags1;
>> > +     unsigned int i;
>> > +
>> > +     if (WARN_ON(old_vec->cpu == new_vec->cpu ||
>> > +                 old_vec->order != new_vec->order ||
>> > +                 (old_vec->local_id & IMSIC_VECTOR_MASK(old_vec)) ||
>> > +                 (new_vec->local_id & IMSIC_VECTOR_MASK(new_vec))))
>> > +             return;
>> > +
>> > +     old_lpriv = per_cpu_ptr(imsic->lpriv, old_vec->cpu);
>> > +     if (WARN_ON(&old_lpriv->vectors[old_vec->local_id] != old_vec))
>> > +             return;
>> > +
>> > +     new_lpriv = per_cpu_ptr(imsic->lpriv, new_vec->cpu);
>> > +     if (WARN_ON(&new_lpriv->vectors[new_vec->local_id] != new_vec))
>> > +             return;
>> > +
>> > +     raw_spin_lock_irqsave(&old_lpriv->ids_lock, flags);
>> > +     raw_spin_lock_irqsave(&new_lpriv->ids_lock, flags1);
>> > +
>> > +     /* Move the state of each vector entry */
>> > +     for (i = 0; i < BIT(old_vec->order); i++) {
>> > +             ovec = old_vec + i;
>> > +             nvec = new_vec + i;
>> > +
>> > +             /* Unmask the new vector entry */
>> > +             if (test_bit(ovec->local_id, old_lpriv->ids_enabled_bitmap))
>> > +                     bitmap_set(new_lpriv->ids_enabled_bitmap,
>> > +                                nvec->local_id, 1);
>> > +
>> > +             /* Mask the old vector entry */
>> > +             bitmap_clear(old_lpriv->ids_enabled_bitmap, ovec->local_id, 1);
>> > +
>> > +             /*
>> > +              * Move and re-trigger the new vector entry based on the
>> > +              * pending state of the old vector entry because we might
>> > +              * get a device interrupt on the old vector entry while
>> > +              * device was being moved to the new vector entry.
>> > +              */
>> > +             old_lpriv->ids_move[ovec->local_id] = nvec;
>> > +     }
>>
>> Hmm, nested spinlocks, and reimplementing what the irq matrix allocator
>> does.
>>
>> Convince me why irq matrix is not a good fit to track the interrupts IDs
>> *and* get handling/tracking for managed/unmanaged interrupts. You said
>> that it was the power-of-two blocks for MSI, but can't that be enfored
>> on matrix alloc? Where are you doing the special handling of MSI?
>>
>> The reason I'm asking is because I'm pretty certain that x86 has proper
>> MSI support (Thomas Gleixner can answer for sure! ;-))
>>
>> IMSIC smells a lot like the the LAPIC. The implementation could probably
>> be *very* close to what arch/x86/kernel/apic/vector.c does.
>>
>> Am I completly off here?
>>
>
> The x86 APIC driver only supports MSI-X due to which the IRQ matrix
> allocator only supports ID/Vector allocation suitable for MSI-X whereas
> the ARM GICv3 driver supports both legacy MSI and MSI-X. In absence
> of legacy MSI support, Linux x86 will fallback to INTx for PCI devices
> with legacy MSI support but for RISC-V platforms we can't assume that
> INTx is available because we might be dealing with an IMSIC-only
> platform.

You're mixing up MSI and *multi-MSI* (multiple MSI vectors).

x86 support MSI-X, MSI, and multi-MSI with IOMMU.

Gleixner has a some insights on why one probably should *not* jump
through hoops to support multi-MSI:
https://lore.kernel.org/all/877d7yhve7.ffs@tglx/

Will we really see HW requiring multi-MSI support on RISC-V systems
without IOMMU? To me this sounds like a theoretical exercise.

> Refer, x86_vector_msi_parent_ops in arch/x86/kernel/apic/msi.c and
> X86_VECTOR_MSI_FLAGS_SUPPORTED in arch/x86/include/asm/msi.h
>
> Refer, its_pci_msi_domain_info in drivers/irqchip/irq-gic-v3-its-pci-msi.c
>
> The changes which I think are need in the IRQ matrix allocator before
> integrating it in the IMSIC driver are the following:
> 1) IRQ matrix allocator assumed NR_VECTORS to be a fixed define
>     which the arch code provides but in RISC-V world the number of
>     IDs are discovered from DT or ACPI.

Ok, let's try to be bit more explicit. Have you had a look at
kernel/irq/matrix.c?

You need to define the IRQ_MATRIX_BITS (which x86 sets to NR_VECTORS).
This is the size of the bitmap. For IMSIC this would be 2047.

The matrix allocator is an excellent fit, modulo multi-MSI. It's battle
proven code.

> 2) IRQ matrix allocator needs to be support allocator multiple vectors
>     in power-of-2 which will allow IMSIC driver to support both legacy
>     MSI and MSI-X. This will involve changing the way best CPU is
>     found, the way bitmap APIs are used and adding some new APIs
>     for allocate vectors in power-of-2

...and all the other things multi-MSI requires.

> Based on above, I suggest we keep the integration of IRQ matrix
> allocator in the IMSIC driver as a separate series which will allow
> us to unblock other series (such as AIA ACPI support, power
> managment related changes in AIA drivers, etc).

I suggest removing the multi-MSI support, and use the matrix allocator.
We have something that looks like what x86 has (IMSIC). We have a
battle-proven implementation, and helper function. In my view it would
be just weird not to piggy-back on that work, and benefit from years of
bugfixes/things we haven't thought of.

Finally; I don't see that you're handling managed interrupt in the
series (Oh, the matrix allocator has support for that! ;-)).

I realize it's some changes, but the interrupt handling is a central
piece.

If you agree with my input, LMK if you're time/work-constrained, and I
can take a stab at integrating it in the series.

Björn