[PATCH 0/37] PCI/MSI: Enforce explicit IRQ vector management by removing devres auto-free
Shawn Lin
shawn.lin at rock-chips.com
Mon Feb 23 18:29:37 PST 2026
在 2026/02/24 星期二 1:38, Andy Shevchenko 写道:
> On Tue, Feb 24, 2026 at 12:09:37AM +0800, Shawn Lin wrote:
>> 在 2026/02/23 星期一 23:50, Andy Shevchenko 写道:
>>> On Mon, Feb 23, 2026 at 5:32 PM Shawn Lin <shawn.lin at rock-chips.com> wrote:
>>>>
>>>> This patch series addresses a long-standing design issue in the PCI/MSI
>>>> subsystem where the implicit, automatic management of IRQ vectors by
>>>> the devres framework conflicts with explicit driver cleanup, creating
>>>> ambiguity and potential resource management bugs.
>>>>
>>>> ==== The Problem: Implicit vs. Explicit Management ====
>>>> Historically, `pcim_enable_device()` not only manages standard PCI resources
>>>> (BARs) via devres but also implicitly triggers automatic IRQ vector management
>>>> by setting a flag that registers `pcim_msi_release()` as a cleanup action.
>>>>
>>>> This creates an ambiguous ownership model. Many drivers follow a pattern of:
>>>> 1. Calling `pci_alloc_irq_vectors()` to allocate interrupts.
>>>> 2. Also calling `pci_free_irq_vectors()` in their error paths or remove routines.
>>>>
>>>> When such a driver also uses `pcim_enable_device()`, the devres framework may
>>>> attempt to free the IRQ vectors a second time upon device release, leading to
>>>> a double-free. Analysis of the tree shows this hazardous pattern exists widely,
>>>> while 35 other drivers correctly rely solely on the implicit cleanup.
>>>
>>> Is this confirmed? What I read from the cover letter, this series was
>>> only compile-tested, so how can you prove the problem exists in the
>>> first place?
>>
>> Yes, it's confirmed. My debug of a double free issue of a out-of-tree
>> PCIe wifi driver which uses
>> pcim_enable_device + pci_alloc_irq_vectors + pci_free_irq_vectors expose
>> it. And we did have a TODO to cleanup this hybrid usage, targeted in
>> this cycle[1] suggested by Philipp:
>
> Okay, fair enough. I think this bit was missing in the cover letter.
>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=msi
>
>>>> ==== The Solution: Making Management Explicit ====
>>>> This series enforces a clear, predictable model:
>>>> 1. New Managed API (Patch 1/37): Introduces pcim_alloc_irq_vectors() and
>>>> pcim_alloc_irq_vectors_affinity(). Drivers that desire devres-managed IRQ
>>>> vectors should use these functions, which set the is_msi_managed flag and
>>>> ensure automatic cleanup.
>>>> 2. Patches 2 through 36 convert each driver that uses pcim_enable_device() alongside
>>>> pci_alloc_irq_vectors() and relies on devres for IRQ vector cleanup to instead
>>>> make an explicit call to pcim_alloc_irq_vectors().
>>>> 3. Core Change (Patch 37/37): With the former cleanup, now modifies pcim_setup_msi_release()
>>>> to check only the is_msi_managed flag. This decouples automatic IRQ cleanup from
>>>> pcim_enable_device(). IRQ vectors allocated via pci_alloc_irq_vectors*()
>>>> are now solely the driver's responsibility to free with pci_free_irq_vectors().
>>>>
>>>> With these changes, we clear ownership model: Explicit resource management eliminates
>>>> ambiguity and follows the "principle of least surprise." New drivers choose one model and
>>>> be consistent.
>>>> - Use `pci_alloc_irq_vectors()` + `pci_free_irq_vectors()` for explicit control.
>>>> - Use `pcim_alloc_irq_vectors()` for devres-managed, automatic cleanup.
>>>
>>> Have you checked previous attempts? Why is your series better than those?
>>
Thanks for sharing this 5-years-old discusstion, I totally missed it.
I read the V7 discussion, and it seems to have disappeared without much
follow-up, like a stone dropped into the ocean. For five years, newly
added drivers have continued to misuse these APIs incorrectly, and
we’ve been watching it happen. I can’t really claim this patch series
is inherently better than Dejin’s earlier work at its core, this is
just about fixing one entire category of misuse in a single pass.
According to Bjorn's final search and reply, if we include the removal
of deprecated APIs, it would require a massive amount of work and might
span many release cycles. Unfortunately, the work never began, and the
cleanup might never be completed. I’m not sure if folks have changed
their minds now. Can we at least start by completing the changes for the
pci_alloc_irq_vectors category?
>> There seems not previous attempts.
>
> Maybe we are looking to the different projects...
>
> https://lore.kernel.org/all/?q=pcim_alloc_irq_vectors
>
>>>> ==== Testing And Review ====
>>>> 1. This series is only compiled test with allmodconfig.
>>>> 2. Given the substantial size of this patch series, I have structured the mailing
>>>> to facilitate efficient review. The cover letter, the first patch and the last one will be sent
>>>> to all relevant mailing lists and key maintainers to ensure broad visibility and
>>>> initial feedback on the overall approach. The remaining subsystem-specific patches
>>>> will be sent only to the respective subsystem maintainers and their associated
>>>> mailing lists, reducing noise.
>
More information about the linux-riscv
mailing list