Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts

Eva Kurchatova nyandarknessgirl at gmail.com
Thu Mar 14 00:12:40 PDT 2024


If an I2C-HID controller level-triggered IRQ line is routed directly as
a PLIC IRQ, and we spam input early enough in kernel boot process
(Somewhere between initializing NET, ALSA subsystems and before
i2c-hid driver init), then there is a chance of kernel locking up
completely and not going any further.

There are no kernel messages printed with all the IRQ, task hang
debugging enabled - other than (sometimes) it reports sched RT
throttling after a few seconds. Basic timer interrupt handling is
intact - fbdev tty cursor is still blinking.

It appears that in such a case the I2C-HID IRQ line is raised; PLIC
notifies the (single) boot system hart, kernel claims the IRQ and
immediately completes it by writing to CLAIM/COMPLETE register.
No access to the I2C controller (OpenCores) or I2C-HID registers
is made, so the HID report is never consumed and IRQ line stays
raised forever. The kernel endlessly claims & completes IRQs
without doing any work with the device. It doesn't always end up this
way; sometimes boot process completes and there are no signs of
interrupt storm or stuck IRQ processing afterwards.

There was a suspicion this has to do with SiFive PLIC being
not-so-explicit about level triggered interrupts. The research of this
issue led this way: There is another DT PLIC binding; a THead one,
and it has a flag `PLIC_QUIRK_EDGE_INTERRUPT` which allows
to define IRQ source behavior as 2-cells in DT; and has some other
changes to the logic (more on that below).
When attempting to mimic a THead PLIC in kernel DT, and rewriting
all DT interrupt sources to use 2-cell description, the hang ceases to
happen. Curious as to what are the kernel side implications of this,
I went to see what `PLIC_QUIRK_EDGE_INTERRUPT` actually does and
bit-by-bit disabled the actual differences this flag makes in the
driver logic.

This return path in irq-sifive-plic.c at 223
(https://elixir.bootlin.com/linux/latest/source/drivers/irqchip/irq-sifive-plic.c#L223)
is only enabled for SiFive PLIC, but not for THead one. Removing
those 2 lines of code from the driver (whilst keeping the DT binding
properly reporting a SiFive PLIC) fixes the hang. I am not an expert
on the PLIC driver to debug further or determine what would be a
proper fix to this, but this probably gets more experienced devs
somewhere (I hope).

This is reproducible at least from Linux 6.4.1 to Linux 6.7.9 on RVVM;
Affects any hardware that would have SiFive PLIC + I2C-HID combination;
Most likely this is reproducible on QEMU as well if it had i2c-hid emulation,
or if we passthrough physical I2C-HID device & inject PLIC IRQs from
it's IRQ line.



More information about the linux-riscv mailing list