[PATCH 2/2] Debug handling of early spurious interrupts

Fri Jul 20 17:43:02 EDT 2007

On Fri, 20 Jul 2007 11:20:43 +0900
Fernando Luis V__zquez Cao <fernando at oss.ntt.co.jp> wrote:

> With the advent of kdump it is possible that device drivers receive
> interrupts generated in the context of a previous kernel. Ideally
> quiescing the underlying devices should suffice but not all drivers do
> this, either because it is not possible or because they did not
> contemplate this case. Thus drivers ought to be able to handle
> interrupts coming in as soon as the interrupt handler is registered.
> 
> Signed-off-by: Fernando Luis Vazquez Cao <fernando at oss.ntt.co.jp>
> ---
> 
> diff -urNp linux-2.6.22-orig/kernel/irq/manage.c linux-2.6.22-pendirq/kernel/irq/manage.c
> --- linux-2.6.22-orig/kernel/irq/manage.c	2007-07-19 18:18:53.000000000 +0900
> +++ linux-2.6.22-pendirq/kernel/irq/manage.c	2007-07-19 19:43:41.000000000 +0900
> @@ -533,21 +533,32 @@ int request_irq(unsigned int irq, irq_ha
>  
>  	select_smp_affinity(irq);
>  
> -	if (irqflags & IRQF_SHARED) {
> -		/*
> -		 * It's a shared IRQ -- the driver ought to be prepared for it
> -		 * to happen immediately, so let's make sure....
> -		 * We do this before actually registering it, to make sure that
> -		 * a 'real' IRQ doesn't run in parallel with our fake
> -		 */
> -		if (irqflags & IRQF_DISABLED) {
> -			unsigned long flags;
> +	/*
> +	 * With the advent of kdump it possible that device drivers receive
> +	 * interrupts generated in the context of a previous kernel. Ideally
> +	 * quiescing the underlying devices should suffice but not all drivers
> +	 * do this, either because it is not possible or because they did not
> +	 * contemplate this case. Thus drivers ought to be able to handle
> +	 * interrupts coming in as soon as the interrupt handler is registered.
> +	 *
> +	 * Besides, if it is a shared IRQ the driver ought to be prepared for
> +	 * it to happen immediately too.
> +	 *
> +	 * We do this before actually registering it, to make sure that
> +	 * a 'real' IRQ doesn't run in parallel with our fake.
> +	 */
> +	if (irqflags & IRQF_DISABLED) {
> +		unsigned long flags;
>  
> -			local_irq_save(flags);
> -			handler(irq, dev_id);
> -			local_irq_restore(flags);
> -		} else
> -			handler(irq, dev_id);
> +		local_irq_save(flags);
> +		retval = handler(irq, dev_id);
> +		local_irq_restore(flags);
> +	} else
> +		retval = handler(irq, dev_id);
> +	if (retval == IRQ_HANDLED) {
> +		printk(KERN_WARNING
> +		       "%s (IRQ %d) handled a spurious interrupt\n",
> +		       devname, irq);
>  	}
>  

This change means that we'll run the irq handler at request_irq()-time even
for non-shared interrupts.

This is a bit of a worry.  See, shared-interrupt handlers are required to
be able to cope with being called when their device isn't interrupting. 
But nobody ever said that non-shared interrupt handlers need to be able to
cope with that.

Hence these non-shared handlers are within their rights to emit warning
printks, go BUG or accuse the operator of having busted hardware.

So bad things might happen because of this change.  And if they do, they
will take a loooong time to be discovered, because non-shared interrupt
handlers tend to dwell in crufty old drivers which not many people use.

So I'm wondering if it would be more prudent to only do all this for shared
handlers?