Regression with legacy IRQ numbers caused by 9a1091ef0017

Tony Lindgren tony at atomide.com
Fri Jan 16 09:29:06 PST 2015


* Russell King - ARM Linux <linux at arm.linux.org.uk> [150116 09:25]:
> On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote:
> > * Russell King - ARM Linux <linux at arm.linux.org.uk> [150116 08:33]:
> > > I would still like to understand /why/ enabling preempt causes the error.
> > > Changing the preempt configuration really should not change what happens
> > > on the bus.  (Think about it.)  It's an indication that there is some
> > > other error present.
> > 
> > We have a wrong irq number caused by $subject. And the wrong irq
> > gets triggered before the dma hardware is configured during dma
> > init. And then we get the invalid access error from omap_l3_noc.
> 
> ... which should happen whether or not preempt is enabled, which is
> really my point.
> 
> We know tha the wrong IRQ gets requested by the driver - and that wrong
> IRQ is requested whether or not we have preempt enabled.  Yet we get
> the warning whether or not preempt is enabled.
> 
> The DMA handler is not registered as a threaded handler, so it's not
> depending on a context switch to execute omap2_dma_irq_handler().
> 
> Another reason why I don't agree with your explanation is that by the
> time setup_irq() is called, we have already poked at the DMA hardware
> several times - omap_clear_dma() and omap2_disable_irq_lch() will have
> been called for each DMA channel - and both will write to the hardware.
> 
> What's more is that the only things left after setup_irq() has been
> called is to possibly reserve the first two DMA channels and print
> the DMA message (via show_dma_caps).  So I see nothing after setup_irq()
> which would "finish" any unfinished hardware initialisation.
> 
> The final reason I don't agree is that I've put a printk() in
> omap2_dma_irq_handler(), and this does not trigger.

Oh, yes that blows my theory completely then.
 
> So, I think this has nothing to do with the DMA hardware /at all/,
> but more to do with the GPIO code, and it suggests that the GPIO code
> publishes IRQs before it is safe for those IRQs to be used.
> 
> Maybe it has to do with omap_gpio_irq_handler() being called... added
> printk(), nope, that's not called either.  So it's not an IRQ which
> gets triggered at all.
> 
> What is called are (in order):
> 
> omap_gpio_unmask_irq()
> omap_set_gpio_irqenable()
> omap_enable_gpio_irqbank()
> 
> and this reveals where the problem is, especially when you then add
> instrumentation into the runtime PM functions - and this reveals that
> when a GPIO IRQ is requested, these functions are called while the
> GPIO is runtime suspended.
> 
> _That_ is where the *real* problem lies - requesting a GPIO interrupt
> results in the kernel touching possibly runtime-suspended hardware.
> 
> The reason it happens with preempt is that preempt introduces scheduling
> points during the kernel boot which would not otherwise be there (with
> preempt disabled, you have to hit an explicit context switch due to
> contention on some lock or a wait in order for some other thread to run.)

OK makes sense.
 
> So, the GPIO driver really needs fixing - and I'd suggest fixing it
> first, before fixing the DMA problem, because the DMA problem allows
> us to see the GPIO problem.

Yes we need to fix that.

Regards,

Tony



More information about the linux-arm-kernel mailing list