Boot hang with SiFive PLIC when routing I2C-HID level-triggered interrupts

Nam Cao namcao at linutronix.de
Sun Mar 17 14:27:18 PDT 2024


Cc: HID folks

On 14/Mar/2024 Eva Kurchatova wrote:
> If an I2C-HID controller level-triggered IRQ line is routed directly as
> a PLIC IRQ, and we spam input early enough in kernel boot process
> (Somewhere between initializing NET, ALSA subsystems and before
> i2c-hid driver init), then there is a chance of kernel locking up
> completely and not going any further.
> 
> There are no kernel messages printed with all the IRQ, task hang
> debugging enabled - other than (sometimes) it reports sched RT
> throttling after a few seconds. Basic timer interrupt handling is
> intact - fbdev tty cursor is still blinking.
> 
> It appears that in such a case the I2C-HID IRQ line is raised; PLIC
> notifies the (single) boot system hart, kernel claims the IRQ and
> immediately completes it by writing to CLAIM/COMPLETE register.
> No access to the I2C controller (OpenCores) or I2C-HID registers
> is made, so the HID report is never consumed and IRQ line stays
> raised forever. The kernel endlessly claims & completes IRQs
> without doing any work with the device. It doesn't always end up this
> way; sometimes boot process completes and there are no signs of
> interrupt storm or stuck IRQ processing afterwards.

It seems I2C HID's interrupt handler (i2c_hid_irq) returns immediately if
I2C_HID_READ_PENDING is set. This flag is supposed to be cleared in
i2c_hid_xfer(), but since the (threaded) interrupt handler runs at higher
priority, the flag is never cleared. So we have a lock-up: interrupt
handler won't do anything unless the flag is cleared, but the clearing of
this flag is done in a lower priority task which never gets scheduled while
the interrupt handler is active.

There is RT throttling to prevent RT tasks from locking up the system like
this. I don't know much about scheduling stuffs, so I am not really sure
why RT throttling does not work. I think because RT throttling triggers
when RT tasks take too much CPU time, but in this case hard interrupt
handlers take lots of CPU time too (~50% according to my measurement), so
RT throttling doesn't trigger often enough (in this case, it triggers once
and never again). Again, I don't know much about scheduler so I may be
talking nonsense here.

The flag I2C_HID_READ_PENDING seems to be used to make sure that only 1
I2C operation can happen at a time. But this seems pointless, because I2C
subsystem already takes care of this. So I think we can just remove it.

Can you give the below patch a try?

diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c
index 2735cd585af0..799ad0ef9c4a 100644
--- a/drivers/hid/i2c-hid/i2c-hid-core.c
+++ b/drivers/hid/i2c-hid/i2c-hid-core.c
@@ -64,7 +64,6 @@
 /* flags */
 #define I2C_HID_STARTED		0
 #define I2C_HID_RESET_PENDING	1
-#define I2C_HID_READ_PENDING	2
 
 #define I2C_HID_PWR_ON		0x00
 #define I2C_HID_PWR_SLEEP	0x01
@@ -190,15 +189,10 @@ static int i2c_hid_xfer(struct i2c_hid *ihid,
 		msgs[n].len = recv_len;
 		msgs[n].buf = recv_buf;
 		n++;
-
-		set_bit(I2C_HID_READ_PENDING, &ihid->flags);
 	}
 
 	ret = i2c_transfer(client->adapter, msgs, n);
 
-	if (recv_len)
-		clear_bit(I2C_HID_READ_PENDING, &ihid->flags);
-
 	if (ret != n)
 		return ret < 0 ? ret : -EIO;
 
@@ -566,9 +560,6 @@ static irqreturn_t i2c_hid_irq(int irq, void *dev_id)
 {
 	struct i2c_hid *ihid = dev_id;
 
-	if (test_bit(I2C_HID_READ_PENDING, &ihid->flags))
-		return IRQ_HANDLED;
-
 	i2c_hid_get_input(ihid);
 
 	return IRQ_HANDLED;



More information about the linux-riscv mailing list