[PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs
Robin Murphy
robin.murphy at arm.com
Mon May 18 08:52:57 PDT 2026
On 18/05/2026 4:19 pm, Oguz, Yigit wrote:
> On 2026-05-08, Robin Murphy wrote:
>> Sorry, but why are unexpected DMA faults happening "at scale" in the
>> first place? If you have so many broken drivers that disambiguating them
>> needs help from the kernel, something seems fundamentally wrong with
>> that picture. Conversely if these are devices assigned to userspace then
>> we should perhaps reconsider their ability to spam up the host kernel
>> log at will anyway.
>
> The use case is VFIO passthrough environments where translation faults
> show up during device lifecycle operations, mainly around device reset.
> When mappings are torn down and a device still has DMA in flight or
> issues DMA during/after FLR, the IOMMU blocks it and logs the fault.
> This series doesn't change when or whether events get logged, it just
> makes the existing lines more useful for triage when they do fire.
>
>> I'm not saying I necessarily have anything against this change in
>> particular, but it has a strong smell of effort being spent on the wrong
>> thing...
>
> Fair point. Whether the faults themselves should be addressed is a
> separate question, but since the kernel already logs them unconditionally,
> making the output more immediately useful seemed like low-hanging fruit.
TBH I think the more appropriate solution would be to have vfio-pci
register its own fault handler, wherein it can properly deal with
rate-limiting and/or entirely suppressing fault reports from misbehaving
userspace, and if and when it does want to log something it is then free
to do that in whatever format it wants, independent of the underlying
IOMMU driver.
Thanks,
Robin.
>> (And even then AFAICS it only really helps in the specific scenario of
>> having only one of each type of device, otherwise you're back to still
>> needing per-system knowledge of how BDFs map to physical instances to
>> know what's what.)
>
> The vendor:device ID answers the first question in triage: "what kind of
> device is this?" Even with multiple instances of the same type, narrowing
> by type cuts down the search space when correlating faults with device
> lifecycle events.
>
> Thanks,
> Yigit
>
>
> On 2026-05-06 4:05 pm, Yigit Oguz wrote:
>> IOMMU fault and event logs currently identify devices using only their
>> PCI segment/bus/device/function (SSSS:BB:DD.F). While mapping a single
>> BDF to a device type is straightforward, doing so at scale across many
>> hosts and thousands of fault events requires additional tooling and
>> manual cross-referencing. Including the vendor:device ID directly in
>> the log line makes each event self-contained and immediately actionable
>> without any post-processing.
>
>
> Sorry, but why are unexpected DMA faults happening "at scale" in the
> first place? If you have so many broken drivers that disambiguating them
> needs help from the kernel, something seems fundamentally wrong with
> that picture. Conversely if these are devices assigned to userspace then
> we should perhaps reconsider their ability to spam up the host kernel
> log at will anyway.
>
>
> I'm not saying I necessarily have anything against this change in
> particular, but it has a strong smell of effort being spent on the wrong
> thing...
>
>
> (And even then AFAICS it only really helps in the specific scenario of
> having only one of each type of device, otherwise you're back to still
> needing per-system knowledge of how BDFs map to physical instances to
> know what's what.)
>
>
> Thanks,
> Robin.
>
>
>> This series adds vendor:device ID (VVVV:DDDD) to IOMMU event logs for
>> ARM SMMUv3, Intel VT-d and AMD IOMMU.
>>
>> Before:
>> arm-smmu-v3 arm-smmu-v3.0.auto: event: F_TRANSLATION client: 0000:2b:11.6
>> sid: 0x158e ssid: 0x0 iova: 0x280000000000 ipa: 0x0
>> DMAR: [DMA Write NO_PASID] Request device [86:00.0] fault addr 0xe0000000
>> [fault reason 0x05] PTE Write access is not set
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 domain=0x000a
>> address=0xe0000000 flags=0x0020]
>>
>> After:
>> arm-smmu-v3 arm-smmu-v3.0.auto: event: F_TRANSLATION client: 0000:2b:11.6 [8086:1533]
>> sid: 0x158e ssid: 0x0 iova: 0x280000000000 ipa: 0x0
>> DMAR: [DMA Write NO_PASID] Request device [0000:86:00.0 8086:1533] fault addr 0xe0000000
>> [fault reason 0x05] PTE Write access is not set
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 8086:1533 domain=0x000a
>> address=0xe0000000 flags=0x0020]
>>
>> Patch 1 adds vendor:device ID to ARM SMMUv3 translation fault logs.
>> Patch 2 adds PCI segment and vendor:device ID to Intel VT-d DMAR
>> fault logs.
>> Patch 3 adds a devid_str helper and vendor:device ID to all AMD IOMMU
>> event log paths.
>>
>> Testing:
>> Build-tested against mainline Linux (torvalds/master).
>>
>> Runtime-tested on a custom downstream branch on ARM SMMUv3, Intel VT-d and
>> AMD IOMMU hosts. Translation faults were induced in a virtualized setup
>> by removing DMA mappings for an in-use region, causing the assigned device's
>> subsequent DMA transactions to hit unmapped IOVAs and produce
>> translation fault events. The resulting log lines were verified to
>> contain the PCI vendor:device ID on all three platforms.
>>
>> Lilit Janpoladyan (1):
>> iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation
>> fault logs
>>
>> Yigit Oguz (2):
>> iommu/vt-d: Add PCI segment and vendor:device ID to DMAR fault logs
>> iommu/amd: Add vendor:device ID to AMD IOMMU event logs
>>
>> drivers/iommu/amd/iommu.c | 94 +++++++++++++--------
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++-
>> drivers/iommu/intel/dmar.c | 33 +++++---
>> 3 files changed, 104 insertions(+), 52 deletions(-)
>>
>
>
>
>
>
>
>
>
> Amazon Web Services Development Center Germany GmbH
> Tamara-Danz-Str. 13
> 10243 Berlin
> Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
> Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
> Sitz: Berlin
> Ust-ID: DE 365 538 597
More information about the linux-arm-kernel
mailing list