[PATCH v4 16/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when queue_has_space() fails

Nicolin Chen nicolinc at nvidia.com
Mon May 18 20:38:59 PDT 2026


It's unusual when the command queue fails the queue_has_space() test. There
must be something stalling the HW so the queue does not advance.

Currently, a possible scenario: arm_smmu_cmdq_issue_cmdlist() may be called
in an IRQ context where IRQ is already disabled. E.g., ata_sg_clean() in
drivers/ata/libata-core.c

When GERROR is affined to the CPU currently running with IRQs disabled, the
GERROR ISR will not run and a CERROR_ILL will not be cleared, which stalls
the HW; arm_smmu_cmdq_poll_until_not_full() then either times out or loops
without seeing CONS advance.

The window is narrow and it's very difficult to trigger this lockup. Yet, a
subsequent change requires serializing the STE update routines between the
attach_dev path (mutex-ed) and the invalidation path (non-mutexed), where a
spin_lock_irqsave is inevitable. And this would expand the currently narrow
window to a wider range -- arm_smmu_write_ste() as well.

Since we have a cmdq_err_handler, call it when queue_has_space() fails, to
give the CMDQ hardware a chance to advance its CONS.

Signed-off-by: Nicolin Chen <nicolinc at nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7f81fd2e92480..0e4f34ed036c6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -723,6 +723,13 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
 
 		while (!queue_has_space(&llq, n + sync)) {
 			local_irq_restore(flags);
+			/*
+			 * If the CMDQ is nearly full, it's possible that the HW
+			 * is stalled by an unhandled GERROR_CMDQ_ERR. Thus give
+			 * cmdq_err_handler a chance before each poll.
+			 */
+			if (cmdq->cmdq_err_handler)
+				cmdq->cmdq_err_handler(smmu, cmdq);
 			if (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
 				dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
 			local_irq_save(flags);
-- 
2.43.0




More information about the linux-arm-kernel mailing list