[PATCH rc v4 0/5] iommu/arm-smmu-v3: Fix device crash on kdump kernel
Nicolin Chen
nicolinc at nvidia.com
Wed Apr 29 00:20:48 PDT 2026
When transitioning to a kdump kernel, the primary kernel might have crashed
while endpoint devices were actively bus-mastering DMA. Currently, the SMMU
driver aggressively resets the hardware during probe by clearing CR0_SMMUEN
and setting the Global Bypass Attribute (GBPA) to ABORT.
In a kdump scenario, this aggressive reset is highly destructive:
a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal
PCIe AER or SErrors that may panic the kdump kernel
b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will bypass
the SMMU and corrupt the physical memory at those 1:1 mapped IOVAs.
To safely absorb in-flight DMA, the kdump kernel must leave SMMUEN=1 intact
and avoid modifying STRTAB_BASE. This allows HW to continue translating in-
flight DMA using the crashed kernel's page tables until the endpoint device
drivers probe and quiesce their respective hardware.
However, the ARM SMMUv3 architecture specification states that updating the
SMMU_STRTAB_BASE register while SMMUEN == 1 is UNPREDICTABLE or ignored.
This leaves a kdump kernel no choice but to adopt the stream table from the
crashed kernel.
In this series:
- Introduce an ARM_SMMU_OPT_KDUMP_ADOPT
- Skip SMMUEN and STRTAB_BASE resets in arm_smmu_device_reset()
- Skip EVENTQ and PRIQ setups including interrupts and their handlers
- Memremap the crashed kernel's stream tables into the kdump kernel [*]
- Defer any default domain attachment to retain STEs until device drivers
explicitly request it.
[*] For verification reason, this series only fixes coherent SMMUs.
For non-ARM_SMMU_OPT_KDUMP_ADOPT cases, keep a status quo since the commit
3f54c447df34f ("iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel"):
full reset followed by driver-initiated reattach, potentially rejecting any
in-flight DMA.
Note that the series requires Jason's work that was merged in v6.12: commit
85196f54743d ("iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg").
I have a backported version that is verified with a v6.8 kernel. I can send
if we see a strong need after this version is accepted.
This is on Github:
https://github.com/nicolinc/iommufd/commits/smmuv3_kdump-v4
Changelog
v4
* Rebase v7.1-rc1
* s/arm_smmu_adopt/arm_smmu_kdump_adopt
* Revert alloc/memremap/fmt on fallback
* Reorder patches to avoid bisect regression
* Use IRQ_NONE for spurious evtq/priq entries
* Cap linear log2size by kdump's allocation bound
* Defer clearing FEAT_2_LVL_STRTAB on linear adopt
* Add arm_smmu_kdump_phys_is_corrupted() validation
* Defer l2 stream table memremap till master inserts
* Re-validate L1 desc on master insert with READ_ONCE
v3
https://lore.kernel.org/all/cover.1777150307.git.nicolinc@nvidia.com/
* s/OPT_KDUMP/OPT_KDUMP_ADOPT
* Do not adopt if GERROR_SFM_ERR
* Retain CR0_ATSCHK beside CR0_SMMUEN
* Clear latched GERROR bits (e.g. CMDQ_ERR)
* Assert ARM_SMMU_FEAT_COHERENCY in adopt functions
* Add STE.Cfg check in arm_smmu_is_attach_deferred()
* Fix validations on return codes from devm_memremap()
* Sanitize crashed kernel register values in adopt functions
* Drop unnecessary l2ptrs guard in arm_smmu_is_attach_deferred()
* Don't enable PRIQ/EVTQ irqs and guard the irq functions for combined
irq cases
v2
https://lore.kernel.org/all/cover.1776286352.git.nicolinc@nvidia.com/
* Add warning in non-coherent SMMU cases
* Keep eventq/priq disabled v.s. enabling-and-disabling-later
* Check KDUMP option in the beginning of arm_smmu_device_reset()
* Validate STRTAB format matches HW capability instead of forcing flags
v1:
https://lore.kernel.org/all/cover.1775763475.git.nicolinc@nvidia.com/
Nicolin Chen (5):
iommu/arm-smmu-v3: Add arm_smmu_kdump_adopt_strtab() for kdump
iommu/arm-smmu-v3: Implement is_attach_deferred() for kdump
iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel
iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset
iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP_ADOPT in probe()
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 475 +++++++++++++++++++-
2 files changed, 452 insertions(+), 24 deletions(-)
--
2.43.0
More information about the linux-arm-kernel
mailing list