possible deadlock due to irq_set_thread_affinity() calling into the scheduler (was Re: [PATCH v3 38/62] KVM: SVM: Take and hold ir_list_lock across IRTE updates in IOMMU)
Paolo Bonzini
pbonzini at redhat.com
Mon Dec 22 06:09:13 PST 2025
On 12/22/25 10:16, Ankit Soni wrote:
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.19.0-rc2 #20 Tainted: G E
> ------------------------------------------------------
> CPU 58/KVM/28597 is trying to acquire lock:
> ff12c47d4b1f34c0 (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
>
> but task is already holding lock:
> ff12c49b28552110 (&svm->ir_list_lock){....}-{2:2}, at: avic_pi_update_irte+0x147/0x270 [kvm_amd]
>
> which lock already depends on the new lock.
>
> Chain exists of:
> &irq_desc_lock_class --> &rq->__lock --> &svm->ir_list_lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&svm->ir_list_lock);
> lock(&rq->__lock);
> lock(&svm->ir_list_lock);
> lock(&irq_desc_lock_class);
>
> *** DEADLOCK ***
>
> So lockdep sees:
>
> &irq_desc_lock_class -> &rq->__lock -> &svm->ir_list_lock
>
> while avic_pi_update_irte() currently holds svm->ir_list_lock and then
> takes irq_desc_lock via irq_set_vcpu_affinity(), which creates the
> potential inversion.
>
> - Is this lockdep warning expected/benign in this code path, or does it
> indicate a real potential deadlock between svm->ir_list_lock and
> irq_desc_lock with AVIC + irq_bypass + VFIO?
I'd treat it as a potential (if unlikely) deadlock:
(a) irq_set_thread_affinity triggers the scheduler via wake_up_process,
while irq_desc->lock is taken
(b) the scheduler calls into KVM with rq_lock taken, and KVM uses
ir_list_lock within __avic_vcpu_load/__avic_vcpu_put
(c) KVM wants to block scheduling for a while and uses ir_list_lock for
that purpose, but then takes irq_set_vcpu_affinity takes irq_desc->lock.
I don't think there's an alternative choice of lock for (c); and there's
no easy way to pull the irq_desc->lock out of the IRQ subsystem--in fact
the stickiness of the situation comes from rq->rq_lock and
irq_desc->lock being both internal and not leaf.
Of the three, the most sketchy is (a); notably, __setup_irq() calls
wake_up_process outside desc->lock. Therefore I'd like so much to treat
it as a kernel/irq/ bug; and the simplest (perhaps too simple...) fix is
to drop the wake_up_process(). The only cost is extra latency on the
next interrupt after an affinity change.
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 8b1b4c8a4f54..fc135bd079a4 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -189,14 +189,10 @@ static void irq_set_thread_affinity(struct irq_desc *desc)
struct irqaction *action;
for_each_action_of_desc(desc, action) {
- if (action->thread) {
+ if (action->thread)
set_bit(IRQTF_AFFINITY, &action->thread_flags);
- wake_up_process(action->thread);
- }
- if (action->secondary && action->secondary->thread) {
+ if (action->secondary && action->secondary->thread)
set_bit(IRQTF_AFFINITY, &action->secondary->thread_flags);
- wake_up_process(action->secondary->thread);
- }
}
}
Marc, what do you think?
Paolo
More information about the linux-arm-kernel
mailing list