[QUESTION] Understanding in-kernel irqchip (APLIC + IMSIC) behavior with multiple devices and interrupt handling
Samuele Paone
s.paone at virtualopensystems.com
Mon Mar 10 03:06:08 PDT 2025
Hello,
I was just wondering if you had the chance to take a look at this?
Regards,
Samuele
On 2/19/25 10:19 AM, Samuele Paone wrote:
> Hello,
> thanks for the reply
>
> I am asking that because we have the suspect that there is
> a small error in the way edge triggered interrupts are emulated when a
> device perform a write on the irqfd.
>
> The point is that in `virt/kvm/eventfd.c` in function `irqfd_wakeup`:
> ```
> if (kvm_arch_set_irq_inatomic(&irq, kvm,
> KVM_USERSPACE_IRQ_SOURCE_ID, 1,
> false) == -EWOULDBLOCK)
> schedule_work(&irqfd->inject);
> ```
> The level is always set to 1. In the case of an edge triggered interrupt,
> I would expect the behavior of `&irqfd->inject`:
> ```
> if (!irqfd->resampler) {
> kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1,
> false);
> kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0,
> false);
> } else
> kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
> irqfd->gsi, 1, false);
> }
> ```
>
> The point is, that can never happen since `kvm_set_irq_inatomic` is
> implemented
> like this:
> ```
> int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
> struct kvm *kvm, int irq_source_id, int level,
> bool line_status)
> {
> if (!level)
> return -EWOULDBLOCK;
> ...
> }
> ```
> Regards,
> Samuele
>
> On 1/31/25 6:06 PM, Anup Patel wrote:
>> On Tue, Jan 7, 2025 at 10:36 PM Samuele Paone
>> <s.paone at virtualopensystems.com> wrote:
>>> Hello everyone,
>>> I have a few questions regarding the in-kernel irqchip
>>> implementation, specifically in the context of APLIC and IMSIC for
>>> RISC-V:
>>>
>>> 1. **Behavior with Multiple Devices**: Does the current implementation
>>> support the creation of a VM with both PCI devices (that use MSI) and
>>> MMIO devices (that use wired interrupts)?
>> Yes, its supported.
>>
>> MSIs from both PCIe and Platform devices are processed by IMSIC.
>> Wired interrupts are handled through APLIC MSI-mode where the
>> APLIC converts interrupt state change into MSI writes.
>>
>>> If such a scenario is not supported, how can I configure an MMIO
>>> device to use MSI interrupts? Would it all boil down to some device
>>> tree configuration
>>> (i.e., the MMIO device tree node "interrupts" property referencing the
>>> IMSIC rather than the APLIC directly)?
>> Platform (or MMIO) devices with MSI support don't need the "interrupts"
>> DT property, instead the "msi-parent" DT property is used to point to
>> the IMSICs.
>>
>>> 2. I find unclear how the variable `level` is used inside function
>>> `kvm_riscv_aia_aplic_inject` in `arch/riscv/kvm/aia_aplic.c`
>>>
>>> ```
>>> switch (irqd->sourcecfg & APLIC_SOURCECFG_SM_MASK) {
>>> case APLIC_SOURCECFG_SM_EDGE_RISE:
>>> if (level && !(irqd->state & APLIC_IRQ_STATE_INPUT) &&
>>> !(irqd->state & APLIC_IRQ_STATE_PENDING))
>>> irqd->state |= APLIC_IRQ_STATE_PENDING;
>>> level = 0;
>>> break;
>>> case APLIC_SOURCECFG_SM_EDGE_FALL:
>>> if (!level && (irqd->state & APLIC_IRQ_STATE_INPUT) &&
>>> !(irqd->state & APLIC_IRQ_STATE_PENDING))
>>> irqd->state |= APLIC_IRQ_STATE_PENDING;
>>> break;
>>> case APLIC_SOURCECFG_SM_LEVEL_HIGH:
>>> if (level && !(irqd->state & APLIC_IRQ_STATE_PENDING))
>>> irqd->state |= APLIC_IRQ_STATE_PENDING;
>>> break;
>>> case APLIC_SOURCECFG_SM_LEVEL_LOW:
>>> if (!level && !(irqd->state & APLIC_IRQ_STATE_PENDING))
>>> irqd->state |= APLIC_IRQ_STATE_PENDING;
>>> break;
>>> }
>>>
>>> if (level)
>>> irqd->state |= APLIC_IRQ_STATE_INPUT;
>>> else
>>> irqd->state &= ~APLIC_IRQ_STATE_INPUT;
>>> ```
>>>
>>> Considering `edge_rise` interrupts, as far as I can see, the
>>> APLIC_IRQ_STATE_INPUT is set only in the code above
>>> but prevents edge_rise interrupts from being set pending.
>>> Moreover, in a system emulator implementation that uses in-kernel
>>> IRQCHIP and
>>> IrqFd, this variable is hard-coded (`virt/kvm/eventfd.c` in function
>>> `irqfd_wakeup` when it calls `kvm_arch_set_irq_inatomic`)
>>> in the kernel as 1.
>> The INPUT bit in the IRQ state represents the raw input
>> state whereas the PENDING bit in the IRQ state represents
>> that given IRQ needs to be taken by CPU.
>>
>>> 3. **MSI-only Support in the irqchip**: The in-kernel irqchip
>>> documentation states that it only supports MSI. However, the RISC-V
>>> hardware documentation mentions that APLIC can take wired interrupts
>>> as input and convert them to MSI to forward to IMSIC. Could someone
>>> clarify what is meant by "MSI-only" support in this context? Are there
>>> specific constraints or implementation details that limit the irqchip
>>> to only support MSI for virtualization?
>> The KVM in-kernel AIA virtualization only has HW acceleration for
>> IMSIC whereas the APLIC MSI-mode is still trap-n-emulated.
>>
>>> 4. **Device tree node differences**: How would a device tree node
>>> backed
>>> by MSI interrupt differ from a device backed by a wired interrupt?
>> It is differentiated using "msi-parent" and "interrupts" DT property.
>>
>> Regards,
>> Anup
More information about the kvm-riscv
mailing list