[PATCH 2/3] iommu/vt-d: Add PCI segment and vendor:device ID to DMAR fault logs
Oguz, Yigit
yigitogu at amazon.de
Fri May 22 08:45:18 PDT 2026
> Not an Intel iommu expert, but I have concerns about using
> pci_get_domain_bus_and_slot() in this path.
>
> AFAICT, dmar_fault_do_one() is running in a IRQ context & the pci_get_*
> family of functions iterates the global PCI klist. It eventually calls
> bus_to_subsys(), which takes a plain spin_lock(&bus_kset->list_lock) [1]
> which isn't IRQ-safe. Same thing with klist_put [2] called in klist_iter_exit
Yes, confirmed. bus_to_subsys() takes a non-IRQ-safe spinlock, so this
is indeed broken in hard IRQ context.
> Same here, pci_dev_put call put_device which might sleep [3] and hence
> shouldn't be called in hard IRQ context.
Agreed.
I looked at converting this to request_threaded_irq() so the handler
runs in process context, but the DMAR fault interrupt is registered
early in boot before kthreads exist. Rearranging the boot sequence just
to enrich a log message isn't feasible.
I also considered a manual linear search, walk the PCI bus and device
lists to find the matching BDF. But on systems with hundreds of devices
registered, that's too much time spent in hard IRQ context.
Do you (or anyone on the list) have ideas for a clean way to get
vendor:device id in this context?
Thanks,
Yigit
On Wed, May 06, 2026 at 03:05:38PM +0000, Yigit Oguz wrote:
> Include the full SSSS:BB:DD.F address with PCI segment and
> vendor:device ID (VVVV:DDDD) in DMAR fault messages. Uses
> iommu->segment for the PCI domain and pci_get_domain_bus_and_slot
> to look up the pci_dev. Falls back to segment:BDF without
> vendor:device if the device is not found.
>
> This brings Intel IOMMU fault logging in line with the ARM SMMUv3
> event decoding, making it easier to identify faulting devices
> (e.g. after FLR) without cross-referencing lspci.
>
> Before:
> DMAR: [DMA Write NO_PASID] Request device [86:00.0] fault addr 0xe0000000
> [fault reason 0x05] PTE Write access is not set
>
> After:
> DMAR: [DMA Write NO_PASID] Request device [0000:86:00.0 8086:1533] fault addr 0xe0000000
> [fault reason 0x05] PTE Write access is not set
>
> Signed-off-by: Yigit Oguz <yigitogu at amazon.de <mailto:yigitogu at amazon.de>>
> Signed-off-by: Lilit Janpoladyan <lilitj at amazon.com <mailto:lilitj at amazon.com>>
> Assisted-by: Claude:claude-4.6-opus
> ---
> drivers/iommu/intel/dmar.c | 33 +++++++++++++++++++++------------
> 1 file changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index d33c119a935e..225fa498d714 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1890,30 +1890,39 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
> {
> const char *reason;
> int fault_type;
> + u8 bus = source_id >> 8;
> + u8 devfn = source_id & 0xFF;
> + struct pci_dev *pdev;
> + char devid[48];
Why not have a #define for this like you have for AMD and Arm?
>
> reason = dmar_get_fault_reason(fault_reason, &fault_type);
>
> + pdev = pci_get_domain_bus_and_slot(iommu->segment, bus, devfn);
Not an Intel iommu expert, but I have concerns about using
pci_get_domain_bus_and_slot() in this path.
AFAICT, dmar_fault_do_one() is running in a IRQ context & the pci_get_*
family of functions iterates the global PCI klist. It eventually calls
bus_to_subsys(), which takes a plain spin_lock(&bus_kset->list_lock) [1]
which isn't IRQ-safe. Same thing with klist_put [2] called in klist_iter_exit
> + if (pdev) {
> + snprintf(devid, sizeof(devid), "%04x:%02x:%02x.%d %04x:%04x",
> + iommu->segment, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> + pdev->vendor, pdev->device);
> + pci_dev_put(pdev);
Same here, pci_dev_put call put_device which might sleep [3] and hence
shouldn't be called in hard IRQ context.
> + } else {
> + snprintf(devid, sizeof(devid), "%04x:%02x:%02x.%d",
> + iommu->segment, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> + }
> +
> if (fault_type == INTR_REMAP) {
> - pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 0x%llx [fault reason 0x%02x] %s\n",
> - source_id >> 8, PCI_SLOT(source_id & 0xFF),
> - PCI_FUNC(source_id & 0xFF), addr >> 48,
> - fault_reason, reason);
> + pr_err("[INTR-REMAP] Request device [%s] fault index 0x%llx [fault reason 0x%02x] %s\n",
> + devid, addr >> 48, fault_reason, reason);
>
> return 0;
> }
>
[-------------- >8 -------------------]
Thanks,
Praan
[1] https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/bus.c#L60 <https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/bus.c#L60>
[2] https://elixir.bootlin.com/linux/v7.0.1/source/lib/klist.c#L209 <https://elixir.bootlin.com/linux/v7.0.1/source/lib/klist.c#L209>
[3] https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/core.c#L3794 <https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/core.c#L3794>
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
More information about the linux-arm-kernel
mailing list