[PATCH] serial: 8250: Avoid "too much work" from bogus rx timeout interrupt

Doug Anderson dianders at chromium.org
Mon Dec 19 09:12:40 PST 2016


On Mon, Dec 19, 2016 at 4:59 AM, Andy Shevchenko
<andriy.shevchenko at linux.intel.com> wrote:
> On Sun, 2016-12-18 at 17:14 -0800, Douglas Anderson wrote:
>> On a Rockchip rk3399-based board during suspend/resume testing, we
>> found that we could get the console UART into a state where it would
>> print this to the console a lot:
>>   serial8250: too much work for irq42
> Have you read the following discussion
> https://www.spinics.net/lists/kernel/msg2059543.html

No, I wasn't aware of that discussion.  Yup, basically the exact same
thing is happening here.  Good to know I'm not alone.  Any idea if the
Baytrail UART is also based on DesignWare IP?

In that thread, Peter said:

> I think there is every likelihood of spurious RX timeout interrupts
> tripping this patch, sorry.
> Unfortunately, I think UART_BUG_ is the only viable possibility.
> Or perhaps fixing the port type as PORT_8250 (thus disabling the fifos).

My change is slightly different than California's in that I'm actually
throwing away the bogus byte and his patch was treating it as a valid
byte.  I don't know if that makes the patch more or less palatable.

I would hate to lose access to the FIFOs just due to this weird corner case.

Do we really think there's a case where there's an RX Timeout
interrupt w/ no "data ready" but that later the data ready will show
up?  Can you quantify how much later you think it will show up?  If we
can quantify how much longer the data will show up in then we should
probably just do a timeout loop right where I added my patch.

Specifically, here's what's happening today with RX Timeout interrupt
without "data ready":

1. We'll get the interrupt
2. We won't do _anything_ to service the interrupt.
3. We'll return back to serial8250_interrupt(), where we'll keep
looping until we get "too much work"
4. We'll break out, but the interrupt will still be active.
5. Go back to #1

...and since this interrupt will keep firing and firing and firing
with no delay in-between, we'll effectively lock the CPU up.

If there are some UARTs that eventually get themselves out of this
state by asserting "data ready" then the above won't be an "infinite"
loop but it will effectively be a tight loop where we won't let
userspace run and won't service other interrupts until we actually get
the data ready.  Since we're already blocking everything else, it
seems like it might be better to directly loop in
serial8250_handle_irq() with a timeout of some sort (how long?  100
us?  1 ms?).  Then we if we get the timeout then we can do the read
and safely work ourselves free.

What do others think about that?


More information about the Linux-rockchip mailing list