[arm arch] local_irq_restore() does not play well with local_fiq_enable()
Jonathan Bell
jonathan at raspberrypi.org
Thu Mar 20 15:54:02 EDT 2014
Hi.
I believe I've found an edge case where the FIQ enable bit in the CPSR for
armv6 can get trampled through the use of a combination of the standard
kernel interfaces and the arch-specific local_fiq_enable/local_fiq_disable
macros.
We have an out-of-tree set of drivers on our mach-bcm2708 port that result
in a bad interaction which disables FIQs for extended periods of time.
The below cliffnotes version is for readability, the full set of relevant
code can be found at:
https://github.com/raspberrypi/linux/blob/rpi-3.10.y-next/drivers/misc/vc04_services/interface/vchiq_arm/
and
https://github.com/raspberrypi/linux/blob/rpi-3.10.y-next/drivers/usb/host/dwc_otg/dwc_otg_hcd_linux.c
In driver #1 (BRCM vchiq messagebox/GPU message passing handler) we do
this:
vchiq_doorbell_irq:
- Has the doorbell been poked from the GPU?
- If yes
up(vhciq_semaphore)
return IRQ_HANDLED
in the corresponding "bottom half" we do something a bit silly but
basically:
vchiq_worker_kthread:
while (1)
down_interruptible(vchiq_semaphore)
process_gpu_messages()
do_some_other_stuff()
In driver #2 (our dwc_otg/dwc2 implementation) we do the following on
init:
hcd_init:
setup the FIQ state
install the fiq handler
enable_fiq(USB)
local_fiq_enable()
usb_add_hcd()
Because the vchiq_worker_thread is waiting on a mailbox interrupt from the
GPU, it initialises and goes to sleep almost immediately. In
down_interruptible, there is a sequence of events (if the sem->count is
<=0) that basically does
local_irq_save(flags)
local_irq_enable()
schedule();
local_irq_disable()
local_irq_restore(flags)
which then continues with the rest of the boot process, and probes USB
which enables the FIQ.
The problem we get is that on boot, the vchiq services are probed and
initialised before the USB driver is. A stale flags variable (with F bit
set) borks things up as far as the FIQ is concerned.
On the first GPU interrupt, the vchiq_doorbell_irq increments the
semaphore and wakes up the vchiq_worker_thread. Which then promptly
overwrites the FIQ bit in the CPSR because the flags were saved before the
FIQ was enabled.
The stale F bit then has the potential to propagate through other threads
using similar mechanisms and causes no end of trouble to a FIQ handler
that expects never to be disabled.
In our out-of-tree arch, I've "fixed" this by making local_irq_restore
never touch the FIQ bit in CPSR.
See
https://github.com/raspberrypi/linux/commit/a0f47344e286768e3ce96268eed1ad0f6cfd9f2c
for the modification to arm/irqflags.h.
Should this be classed as a bug? The root cause is that the same register
is used to enable priority FIQs as well as normal IRQs - thus the
save(flags) and restore(flags) mechanisms trample on something that they
don't know about.
More information about the linux-arm-kernel
mailing list