[arm arch] local_irq_restore() does not play well with local_fiq_enable()

Jonathan Bell jonathan at raspberrypi.org
Thu Mar 20 15:54:02 EDT 2014


Hi.

I believe I've found an edge case where the FIQ enable bit in the CPSR for  
armv6 can get trampled through the use of a combination of the standard  
kernel interfaces and the arch-specific local_fiq_enable/local_fiq_disable  
macros.

We have an out-of-tree set of drivers on our mach-bcm2708 port that result  
in a bad interaction which disables FIQs for extended periods of time.

The below cliffnotes version is for readability, the full set of relevant  
code can be found at:

https://github.com/raspberrypi/linux/blob/rpi-3.10.y-next/drivers/misc/vc04_services/interface/vchiq_arm/
and
https://github.com/raspberrypi/linux/blob/rpi-3.10.y-next/drivers/usb/host/dwc_otg/dwc_otg_hcd_linux.c

In driver #1 (BRCM vchiq messagebox/GPU message passing handler) we do  
this:

vchiq_doorbell_irq:
	- Has the doorbell been poked from the GPU?
	- If yes
		up(vhciq_semaphore)
		return IRQ_HANDLED


in the corresponding "bottom half" we do something a bit silly but  
basically:


vchiq_worker_kthread:
	while (1)
		down_interruptible(vchiq_semaphore)
		process_gpu_messages()
		do_some_other_stuff()


In driver #2 (our dwc_otg/dwc2 implementation)	 we do the following on  
init:

hcd_init:
	setup the FIQ state
	install the fiq handler
	enable_fiq(USB)
	local_fiq_enable()
	usb_add_hcd()

Because the vchiq_worker_thread is waiting on a mailbox interrupt from the  
GPU, it initialises and goes to sleep almost immediately. In  
down_interruptible, there is a sequence of events (if the sem->count is  
<=0) that basically does

local_irq_save(flags)
local_irq_enable()
schedule();
local_irq_disable()
local_irq_restore(flags)

which then continues with the rest of the boot process, and probes USB  
which enables the FIQ.

The problem we get is that on boot, the vchiq services are probed and  
initialised before the USB driver is. A stale flags variable (with F bit  
set) borks things up as far as the FIQ is concerned.

On the first GPU interrupt, the vchiq_doorbell_irq increments the  
semaphore and wakes up the vchiq_worker_thread. Which then promptly  
overwrites the FIQ bit in the CPSR because the flags were saved before the  
FIQ was enabled.

The stale F bit then has the potential to propagate through other threads  
using similar mechanisms and causes no end of trouble to a FIQ handler  
that expects never to be disabled.

In our out-of-tree arch, I've "fixed" this by making local_irq_restore  
never touch the FIQ bit in CPSR.

See
https://github.com/raspberrypi/linux/commit/a0f47344e286768e3ce96268eed1ad0f6cfd9f2c
for the modification to arm/irqflags.h.

Should this be classed as a bug? The root cause is that the same register  
is used to enable priority FIQs as well as normal IRQs - thus the  
save(flags) and restore(flags) mechanisms trample on something that they  
don't know about.



More information about the linux-arm-kernel mailing list