possible deadlock due to irq_set_thread_affinity() calling into the scheduler (was Re: [PATCH v3 38/62] KVM: SVM: Take and hold ir_list_lock across IRTE updates in IOMMU)

Paolo Bonzini pbonzini at redhat.com
Mon Dec 22 06:09:13 PST 2025


On 12/22/25 10:16, Ankit Soni wrote:
>    ======================================================
>    WARNING: possible circular locking dependency detected
>    6.19.0-rc2 #20 Tainted: G            E
>    ------------------------------------------------------
>    CPU 58/KVM/28597 is trying to acquire lock:
>      ff12c47d4b1f34c0 (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
> 
>      but task is already holding lock:
>      ff12c49b28552110 (&svm->ir_list_lock){....}-{2:2}, at: avic_pi_update_irte+0x147/0x270 [kvm_amd]
> 
>      which lock already depends on the new lock.
> 
>    Chain exists of:
>      &irq_desc_lock_class --> &rq->__lock --> &svm->ir_list_lock
> 
>    Possible unsafe locking scenario:
> 
>          CPU0                            CPU1
>          ----                            ----
>     lock(&svm->ir_list_lock);
>                                        lock(&rq->__lock);
>                                        lock(&svm->ir_list_lock);
>     lock(&irq_desc_lock_class);
> 
>          *** DEADLOCK ***
> 
> So lockdep sees:
> 
>    &irq_desc_lock_class -> &rq->__lock -> &svm->ir_list_lock
> 
> while avic_pi_update_irte() currently holds svm->ir_list_lock and then
> takes irq_desc_lock via irq_set_vcpu_affinity(), which creates the
> potential inversion.
> 
>    - Is this lockdep warning expected/benign in this code path, or does it
>      indicate a real potential deadlock between svm->ir_list_lock and
>      irq_desc_lock with AVIC + irq_bypass + VFIO?

I'd treat it as a potential (if unlikely) deadlock:

(a) irq_set_thread_affinity triggers the scheduler via wake_up_process,
while irq_desc->lock is taken

(b) the scheduler calls into KVM with rq_lock taken, and KVM uses
ir_list_lock within __avic_vcpu_load/__avic_vcpu_put

(c) KVM wants to block scheduling for a while and uses ir_list_lock for
that purpose, but then takes irq_set_vcpu_affinity takes irq_desc->lock.

I don't think there's an alternative choice of lock for (c); and there's
no easy way to pull the irq_desc->lock out of the IRQ subsystem--in fact
the stickiness of the situation comes from rq->rq_lock and
irq_desc->lock being both internal and not leaf.

Of the three, the most sketchy is (a); notably, __setup_irq() calls
wake_up_process outside desc->lock.  Therefore I'd like so much to treat
it as a kernel/irq/ bug; and the simplest (perhaps too simple...) fix is
to drop the wake_up_process().  The only cost is extra latency on the
next interrupt after an affinity change.

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 8b1b4c8a4f54..fc135bd079a4 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -189,14 +189,10 @@ static void irq_set_thread_affinity(struct irq_desc *desc)
  	struct irqaction *action;
  
  	for_each_action_of_desc(desc, action) {
-		if (action->thread) {
+		if (action->thread)
  			set_bit(IRQTF_AFFINITY, &action->thread_flags);
-			wake_up_process(action->thread);
-		}
-		if (action->secondary && action->secondary->thread) {
+		if (action->secondary && action->secondary->thread)
  			set_bit(IRQTF_AFFINITY, &action->secondary->thread_flags);
-			wake_up_process(action->secondary->thread);
-		}
  	}
  }
  
Marc, what do you think?

Paolo




More information about the linux-arm-kernel mailing list