[RFC PATCH] tty: pl011: Avoid stuck-off spurious interrupts

Wei Xu xuwei5 at hisilicon.com
Tue Jan 30 02:18:36 PST 2018


Hi Dave,

On 2018/1/30 9:25, Dave Martin wrote:
> On Mon, Jan 29, 2018 at 05:50:19PM +0000, Wei Xu wrote:
>> Hi Dave,
>>
>> On 2018/1/29 16:09, Dave Martin wrote:
>>> Commit 9b96fbacda34 ("serial: PL011: clear pending interrupts")
>>> clears the RX and receive timeout interrupts on pl011 startup, to
>>> avoid a screaming-interrupt scenario that can occur when the
>>> firmware or bootloader leaves these interrupts spuriously
>>> asserted.
>>>
>>> This has been noted as an issue when running Linux on qemu [1].
>>>
>>> Unfortunately, the above fix seems to lead to potential
>>> misbehaviour if the RX FIFO interrupt is asserted _non_ spuriously
>>> on driver startup, if the RX FIFO is also already full to the
>>> trigger level.
>>>
>>> Clearing the RX FIFO interrupt does not change the FIFO fill level.
>>> In this scenario, because the interrupt is now clear and because
>>> the FIFO is already full to the trigger level, no new assertion of
>>> the RX FIFO interrupt can occur unless the FIFO is drained back
>>> below the trigger level.  This never occurs because the pl011
>>> driver is waiting for an RX FIFO interrupt to tell it that there is
>>> something to read, and does not read the FIFO at all until that
>>> interrupt occurs.
>>>
>>> Thus, simply clearing "spurious" interrupts on startup may be
>>> misguided, since there is no way to be sure that the interrupts are
>>> truly spurious, and things can go wrong if they are not.
>>>
>>> This patch attempts to handle (suspected) spurious interrupts more
>>> robustly, by allowing the interrupt(s) to fire but quenching the
>>> scream.
>>>
>>> pl011_int() runs and attempts to drain the FIFO anyway just as if
>>> the interrupts were real.  If the FIFO is already empty, great.  To
>>> avoid a screaming spurious interrupt, the RX FIFO and timeout
>>> interrupts are now explicitly cleared in between committing to
>>> drain the RX FIFO and actually draining it.  We do not have to
>>> worry about lost interrupts here, because we are effectively in
>>> polled mode inside pl011_int() until the RX FIFO becomes empty:
>>>
>>>   * A new char received before the RX FIFO is fully drained will be
>>>     drained out synchronously by pl011_int() along with the other
>>>     chars already pending.  A new char received after the RX FIFO
>>>     is drained will result in correct RX FIFO interrupt assertion,
>>>     because emptying the RX FIFO guarantees that the RX FIFO /
>>>     timeout interrupt state machines are back in a sane state.
>>>
>>>   * A new RX timeout before the RX FIFO is fully drained is no
>>>     problem, because pl011_int() has already committed to emptying
>>>     the FIFO at this point, guaranteeing that no stray chars will
>>>     be left behind.  A new RX timeout after the RX FIFO is fully
>>>     drained will result in correct interrupt assertion.
>>>
>>> This patch does not attempt to address the case where the RX FIFO
>>> fills faster than it can be drained: that is a pathological
>>> condition that is beyond the scope of the driver to work around.
>>> Users cannot expect this to work unless they enable hardware flow
>>> control.
>>>
>>> [1] [Qemu-devel] [Qemu-arm] [PATCH] pl011: do not put into fifo
>>> before enabled the interruption
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg06446.html
>>>
>>> Reported-by: Wei Xu <xuwei5 at hisilicon.com>
>>> Cc: Russell King <linux at armlinux.org.uk>
>>> Cc: Linus Walleij <linus.walleij at linaro.org>
>>> Cc: Peter Maydell <peter.maydell at linaro.org>
>>> Fixes: 9b96fbacda34 ("serial: PL011: clear pending interrupts")
>>> Signed-off-by: Dave Martin <Dave.Martin at arm.com>
>>> ---
>>>  drivers/tty/serial/amba-pl011.c | 6 +-----
>>>  1 file changed, 1 insertion(+), 5 deletions(-)
>>>
>>
>> After commented the 1549 line in the amba-pl011.c[1] and applied this patch,
>> the console is not hanged any more.
>>
>> The UART011_ICR is cleared in the pl011_hwinit that will clear the
>> RX interruption as well.
>> Is it OK to remove 1549 as well?
>> Thanks!
> 
> Hmm, that probably does make sense: I was looking only at the
> pl011_startup() path, but the clearing of interrupts in pl011_hwinit()
> appears like it can lead to the same problem, as you say.
> 
> So I think that it makes sense to remove RTIS and RXIS from the status
> bit clearing on line 1550.  To avoid accidentally breaking any existing
> behaviour, it may be safer to leave the other interrupts alone.
> 
> So does the following work?

Yes, it works with below patch.

> 
> (Note, have you tried to reproduce the bug and test the fix on mainline?
> The amba-pl011 driver in v4.4 looks a little different here.)

I can reproduce it with 4.15-rc9 Kernel and 2.11.0 QEMU without your patches.
And the issue is gone after applied your patches.
The log and commands are as below.

#The QEMU command is as:

	qemu-system-aarch64 -m 1024 -cpu host -M virt -nographic -kernel Image \
	-initrd rootfs.cpio.gz --enable-kvm \
	-device virtio-net-device,mac=9a:eb:ec:ed:ee:ef,id=idKUWkoN,netdev=idrOKidE \
	-netdev user,id=idrOKidE,hostfwd=tcp::5000-:22 \
	-append "ip=dhcp console=ttyAMA0 root=/dev/vda -m 4096 rw earlycon=pl011,0x9000000" 	

#The console is hung and the log of the console is as:

	root at ubuntu:~#
	> -device virtio-net-device,mac=9a:eb:ec:ed:ee:ef,id=idKUWkoN,netdev=idrOKidE \
	>  -netdev user,id=idrOKidE,hostfwd=tcp::5000-:22 \d=idKUWkoN,netdev=idrOKidE  \
	0000" end "ip=dhcp console=ttyAMA0 root=/dev/vda -m 4096 rw earlycon=pl011,0x900
	[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd082]
	[    0.000000] Linux version 4.15.0-rc9 (xuwei at EstBuildSvr1) (gcc version 4.9.2 20140904 (prerelease) (crosstool-NG linaro-1.13.1-4.9-2014.09 - Linaro GCC 4.9-2014.09)) #176 SMP PREEMPT Tue Jan 30 18:00:2
	6 CST 2018

	......

	[    1.733403] ALSA device list:
	[    1.734046]   No soundcards found.
	[    1.734901] uart-pl011 9000000.pl011: no DMA platform data
	[    1.737619] Freeing unused kernel memory: 1600K
	Starting rcS...
	++ Mounting filesystem
	++ Setting up mdev
	++ Starting telnet daemon
	++ Starting http daemon
	++ Starting ftp daemon
	++ Starting dropbear (ssh) daemon
	rcS Complete
	Welcome to Mini Linux
	GNU/Linux 4.15.0-rc9 aarch64
	estuary:/$

#And you can found that there is no pl011 interruption any more
and the the log from the ssh console is as:

	estuary:~$ cat /proc/interrupts
			   CPU0
	  3:        331     GIC-0  27 Level     arch_timer
	 36:         47     GIC-0  79 Edge      virtio0
	 38:          0     GIC-0  34 Level     rtc-pl031
	 39:          0     GIC-0  33 Level     uart-pl011
	 40:          0     GIC-0  23 Level     arm-pmu
	 41:          0     pl061   3 Edge      GPIO Key Poweroff
	IPI0:         0       Rescheduling interrupts
	IPI1:         0       Function call interrupts
	IPI2:         0       CPU stop interrupts
	IPI3:         0       CPU stop (for crash dump) interrupts
	IPI4:         0       Timer broadcast interrupts
	IPI5:         0       IRQ work interrupts
	IPI6:         0       CPU wake-up interrupts
	Err:          0
	estuary:~$ 	

Thank you so much!

Best Regards,
Wei

> 
> Cheers
> ---Dave
> 
> diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
> index e7b5fc6..dd6c285 100644
> --- a/drivers/tty/serial/amba-pl011.c
> +++ b/drivers/tty/serial/amba-pl011.c
> @@ -1672,9 +1672,8 @@ static int pl011_hwinit(struct uart_port *port)
>  
>  	uap->port.uartclk = clk_get_rate(uap->clk);
>  
> -	/* Clear pending error and receive interrupts */
> -	pl011_write(UART011_OEIS | UART011_BEIS | UART011_PEIS |
> -		    UART011_FEIS | UART011_RTIS | UART011_RXIS,
> +	/* Clear pending error interrupts */
> +	pl011_write(UART011_OEIS | UART011_BEIS | UART011_PEIS | UART011_FEIS,
>  		    uap, REG_ICR);
>  
>  	/*
> 
> [...]
> 
> .
> 




More information about the linux-arm-kernel mailing list