[RFC PATCH v1 0/4] arm/arm64: fix a migrating irq bug when hotplug cpu

Yang Yingliang yangyingliang at huawei.com
Sun Sep 6 19:54:52 PDT 2015



On 2015/9/6 16:07, Jiang Liu wrote:
> On 2015/9/6 12:23, Yang Yingliang wrote:
>> Hi All,
>>
>> There is a bug:
>>
>> When cpu is disabled, all irqs will be migratged to another cpu.
>> In some cases, a new affinity is different, it needed to be coppied
>> to irq's affinity. But if the type of irq is LPI, it's affinity will
>> not be coppied because of irq_set_affinity's return value.
>>
>>
>>
>> As Marc and Will suggested, I refactor the arm/arm64 migrating interrupts
>> code and fix the migrating irq bug while cpu is offline.
>>
>> I'm trying let the core code do the migrating interrupts matter. kernel/irq/migration.c
>> depends on CONFIG_GENERIC_PENDING_IRQ, so I make it selected by CONFIG_SMP and
>> CONFIG_HOTPLUG_CPU and rename it to CONFIG_GENERIC_IRQ_MIGRATION for more general.
>> When CONFIG_GENERIC_IRQ_MIGRATION is enabled, an interrupt whose state_use_accessors
>> is not set with IRQD_MOVE_PCNTXT won't be migrated immediately in irq_set_affinity_locked().
>> So introduce irq_settings_set_move_pcntxt() helper to set the state in gic_irq_domain_map().
>>
>> With the above preparation, move the migrating interrupts code into kernel/irq/migration.c
>> and fix the bug by using irq_do_set_affinity().
> Hi Yingliang,
> 	As we are going to move migrate_irqs() to generic kernel
> code, and powerpc, metag, xtensa, sh, ia64 mn10300 also defines
>   migrate_irqs() too. It would be great if we could consolidate
> all these.
> 	And as we are going to refine these code, there's another
> issue need attention. On x86, we need to allocate a CPU vector
> if an irq is directed to a CPU. So there's possibility that
> we run out of CPU vectors after CPU hot-removal. So we have a
> mechanism to detect whether we will run out of CPU vector
> after removing a CPU, and reject CPU hot-removal if that will
> happen.
> 	So the key point is, if we a need to allocate some sort
> of resource on the target CPUs for an irq, we need two steps
> when removing a CPU
> 1) check whether resources are available after removing the CPU,
>     and reject CPU removal request if we ran out of resource
> 2) fix irqs after hot-removing the CPU.
> Thanks!
> Gerry
>

On arm, as I know, it doesn't need extra resource for an irq.
I am not sure other platform need this way besides x86.

I think we could consolidate all migrate_irqs() later. I am not
sure if it's good to do so big changing and modify other arch code in
a patchset that supposed to fix a bug of arm.




More information about the linux-arm-kernel mailing list