[PATCH 03/16] tty: serial: 8250_core: read only RX if there is something in the FIFO

Tony Lindgren tony at atomide.com
Wed Feb 11 12:03:14 PST 2015


* Peter Hurley <peter at hurleysoftware.com> [150211 12:05]:
> On 02/10/2015 12:46 PM, Peter Hurley wrote:
> > On 02/10/2015 07:04 AM, Nicolas Schichan wrote:
> >> On 02/10/2015 12:34 AM, Peter Hurley wrote:
> >>> Hi Nicolas,
> >>>
> >>> Thanks for the report.
> >>>
> >> [...]
> >>>> When a caracter is received on the UART while the kernel is printing
> >>>> the boot messages, as soon as the kernel configures the UART for
> >>>> receiving (after root filesystem mount), it gets stuck printing the
> >>>> following message repeatedly:
> >>>>
> >>>> serial8250: too much work for irq29
> >>>>
> >>>> Once stuck, the reception of another character allows the boot process
> >>>> to finish.
> >>>>
> >>>> From what I can gather, when we hit that, the UART_IIR_NO_INT is 0 (so the
> >>>> interrupt is raised), but the UART_LSR_DR bit is 0 as well so the UART_RX
> >>>> register is never read to clear the interrupt.
> >>>
> >>> The "too much work" message means serial8250_handle_irq() is returning 0,
> >>> ie., not handled. Which in turn means IIR indicates no interrupt is pending
> >>> (UART_IIR_NO_INT == 1).
> >>>
> >>> Can you log the register values for LSR and IIR at both patch locations
> >>> in serial8250_do_startup()?
> >>>
> >>> (I can get you a debug patch, if necessary. Let me know)
> >>
> >> Hi Peter,
> >>
> >> Thanks for your reply.
> >>
> >> Here is what I have when the issue is triggered:
> >>
> >> [   12.154877] lsr 0x60 / iir 0x01
> >> [   12.158071] lsr 0x60 / iir 0x01
> >> [   12.161438] serial8250: too much work for irq29
> >> [   12.165982] lsr 0x60 / iir 0x0c
> >> [   12.169354] serial8250: too much work for irq29
> >> [   12.173900] lsr 0x60 / iir 0x0c
> >> (previous two messages are repeated and printk_ratelimited())
> > 
> > Thanks for this information; I see I was wrong about the cause of message.
> > 
> > I think what happens during startup is that on this silicon clearing
> > the rx fifo (by serial8250_clear_fifos()) clears data ready but not
> > the rx timeout condition which causes a spurious rx interrupt when
> > interrupts are enabled.
> > 
> > So caught between two broken UARTs: one that underflows its rx fifo because
> > of unsolicited rx reads and the other that generates spurious interrupt
> > without unsolicited rx reads.
> > 
> > 
> >> When the issue is not triggered:
> >>
> >> [   10.784871] lsr 0x60 / iir 0x01
> >> [   10.788066] lsr 0x60 / iir 0x01
> >> [   10.794734] VFS: Mounted root (nfs filesystem) readonly on device 0:13.
> >> [   10.801654] devtmpfs: mounted
> >> [   10.805169] Freeing unused kernel memory: 184K (807be000 - 807ec000)
> >> (userland takes over after that)
> >>
> >> I have also displayed the IIR and LSR registers when the "too much fork for
> >> IRQ" condition is triggered.
> >>
> >> In the serial8250_do_startup(), before the interrupt are unmasked at the end,
> >> the IIR looks sane and UART_IIR_NO_INT bit is set. When stuck
> >> serial8250_interrupt(), UART_IIR_NO_INT is cleared and the interrupt ID is set
> >> to 0xc which is not handled by the kernel at this time (the Kirkwood datasheet
> >>  indicates that it is some kind of timeout condition from what I can gather).
> > 
> > Yes, IIR == UART_IIR_RX_TIMEOUT is to used indicate that data is in the rx fifo
> > but has not reached the rx trigger level yet.
> > 
> > ATM, I'm not exactly sure if there is a safe way to clear the spurious interrupt
> > from the interrupt handler.
> > 
> > I'm fairly certain the only way to clear the rx timeout interrupt is to read
> > the rx fifo, but I think this would race with actual data arrival. IOW, there
> > might not be a way to determine if the data read is spurious or not.
> 
> Yep, I see no safe way to clear the spurious interrupt [1] and no idea how to
> keep it from happening (other than via the unsolicited RX reads in
> serial8250_do_startup).
> 
> Unfortunately, I think this means we'll have to revert Sebastian's commit:
> 
> commit 0aa525d11859c1a4d5b78fdc704148e2ae03ae13
> Author: Sebastian Andrzej Siewior <bigeasy at linutronix.de>
> Date:   Wed Sep 10 21:29:58 2014 +0200
> 
>     tty: serial: 8250_core: read only RX if there is something in the FIFO
>     
> which just means OMAP3630 will be limited to using the omap_serial driver.

Reverting makes sense to me if it has caused a regression. Maybe Sebastian
can update his patch to do this based on some quirk flag instead?

Regards,

Tony

 
> [1]  To clear the RX timeout interrupt requires reading the rx fifo even though
> LSR[data ready] indicates no data. However, this could result in dropped data
> if the data became available just before clearing the RX timeout. For example,
> 
>  CPU                                 | Device
>                                      |
>  irq handler (simplified)            |
>                                      |
>    read IIR                          |
>    is interrupt? yes                 |
>    read LSR                          |
>    is data ready? no                 |
>    is IIR == Rx timeout? yes         | new data arrives
>                                      | rx_fifo[0] = new data
>                                      | lsr[data ready] = 1
>    read RX and discard               |
>                                      |
> 



More information about the linux-arm-kernel mailing list