[PATCH] irqchip/riscv-imsic: Fix irq migration failure issue when cpu hotplug.

Anup Patel anup at brainfault.org
Tue Feb 3 09:09:51 PST 2026


On Tue, Feb 3, 2026 at 7:05 PM Thomas Gleixner <tglx at kernel.org> wrote:
>
> On Tue, Feb 03 2026 at 16:02, Yingjun Ni wrote:
> > Add a null pointer check for irq_write_msi_msg to fix NULL pointer
> > dereference issue when migrating irq.
> >
> > Modify the return value of imsic_irq_set_affinity to let the subdomain
> > PCI-MSIX migrate the irq to a new cpu when cpu hotplug.
> >
> > Don't set vec->move_next in imsic_vector_move_update when the cpu is
> > offline, because it will never be cleared.
>
> You completely fail to explain the actual problem and the root
> cause. See
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog
>
> >  drivers/irqchip/irq-riscv-imsic-platform.c | 8 ++++++--
> >  drivers/irqchip/irq-riscv-imsic-state.c    | 5 +++++
> >  2 files changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-riscv-imsic-platform.c b/drivers/irqchip/irq-riscv-imsic-platform.c
> > index 643c8e459611..131e4f2b5431 100644
> > --- a/drivers/irqchip/irq-riscv-imsic-platform.c
> > +++ b/drivers/irqchip/irq-riscv-imsic-platform.c
> > @@ -93,9 +93,13 @@ static void imsic_irq_compose_msg(struct irq_data *d, struct msi_msg *msg)
> >  static void imsic_msi_update_msg(struct irq_data *d, struct imsic_vector *vec)
> >  {
> >       struct msi_msg msg = { };
> > +     struct irq_chip *irq_chip = irq_data_get_irq_chip(d);
> > +
> > +     if (!irq_chip->irq_write_msi_msg)
> > +             return;
>
> I have no idea how this ever worked. The irq_data pointer belongs to the
> IMSIC base domain, which definitely does not have a irq_write_msi_msg()
> callback and never can have one.

The imsic_irq_set_affinity() passes irq_get_irq_data(d->irq) as
irq_data pointer to imsic_msi_update_msg() expecting it to be
the top-level irq_data. The imsic_msi_update_msg() assumes
that the top-level irq_data always has irq_write_msi_msg() but if
this assumption is not correct then we need an if-check over here.

>
> The write message callback is always implemented by the top most domain,
> in this case the PCI/MSI[x] per device domain.
>
> So this code is simply broken and your NULL pointer check just makes it
> differently broken.
>
> >       imsic_irq_compose_vector_msg(vec, &msg);
> > -     irq_data_get_irq_chip(d)->irq_write_msi_msg(d, &msg);
> > +     irq_chip->irq_write_msi_msg(d, &msg);
> >  }
> >
> >  static int imsic_irq_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
> > @@ -173,7 +177,7 @@ static int imsic_irq_set_affinity(struct irq_data *d, const struct cpumask *mask
> >       /* Move state of the old vector to the new vector */
> >       imsic_vector_move(old_vec, new_vec);
> >
> > -     return IRQ_SET_MASK_OK_DONE;
> > +     return IRQ_SET_MASK_OK;
>
> Have you actually looked at the consequences of this change?

I agree. This seems unrelated and there is no explanation
in the commit description.

>
> >  }
> >
> >  static void imsic_irq_force_complete_move(struct irq_data *d)
> > diff --git a/drivers/irqchip/irq-riscv-imsic-state.c b/drivers/irqchip/irq-riscv-imsic-state.c
> > index b6cebfee9461..cd1bf9516878 100644
> > --- a/drivers/irqchip/irq-riscv-imsic-state.c
> > +++ b/drivers/irqchip/irq-riscv-imsic-state.c
> > @@ -362,6 +362,10 @@ static bool imsic_vector_move_update(struct imsic_local_priv *lpriv,
> >       /* Update enable and move details */
> >       enabled = READ_ONCE(vec->enable);
> >       WRITE_ONCE(vec->enable, new_enable);
> > +
> > +     if (!cpu_online(vec->cpu) && is_old_vec)
> > +             goto out;
>
> This is definitely not correct as this should still cleanup software
> state, no?
>
> Thanks,
>
>         tglx

Regards,
Anup



More information about the linux-riscv mailing list