[PATCH v4 16/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when queue_has_space() fails
Nicolin Chen
nicolinc at nvidia.com
Mon May 18 20:38:59 PDT 2026
It's unusual when the command queue fails the queue_has_space() test. There
must be something stalling the HW so the queue does not advance.
Currently, a possible scenario: arm_smmu_cmdq_issue_cmdlist() may be called
in an IRQ context where IRQ is already disabled. E.g., ata_sg_clean() in
drivers/ata/libata-core.c
When GERROR is affined to the CPU currently running with IRQs disabled, the
GERROR ISR will not run and a CERROR_ILL will not be cleared, which stalls
the HW; arm_smmu_cmdq_poll_until_not_full() then either times out or loops
without seeing CONS advance.
The window is narrow and it's very difficult to trigger this lockup. Yet, a
subsequent change requires serializing the STE update routines between the
attach_dev path (mutex-ed) and the invalidation path (non-mutexed), where a
spin_lock_irqsave is inevitable. And this would expand the currently narrow
window to a wider range -- arm_smmu_write_ste() as well.
Since we have a cmdq_err_handler, call it when queue_has_space() fails, to
give the CMDQ hardware a chance to advance its CONS.
Signed-off-by: Nicolin Chen <nicolinc at nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7f81fd2e92480..0e4f34ed036c6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -723,6 +723,13 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
while (!queue_has_space(&llq, n + sync)) {
local_irq_restore(flags);
+ /*
+ * If the CMDQ is nearly full, it's possible that the HW
+ * is stalled by an unhandled GERROR_CMDQ_ERR. Thus give
+ * cmdq_err_handler a chance before each poll.
+ */
+ if (cmdq->cmdq_err_handler)
+ cmdq->cmdq_err_handler(smmu, cmdq);
if (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
local_irq_save(flags);
--
2.43.0
More information about the linux-arm-kernel
mailing list