[PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts

Robin Murphy robin.murphy at arm.com
Fri Mar 6 05:22:11 PST 2026


On 2026-03-05 11:41 pm, Jason Gunthorpe wrote:
> On Thu, Mar 05, 2026 at 01:15:45PM -0800, Nicolin Chen wrote:
> 
>> You mean in arm_smmu_cmdq_issue_cmdlist() that issued the timed
>> out ATC command?
> 
> Yes, it was my off hand thought.
> 
>> So my test case was to trigger a device fault followed by an ATC
>> command. But, I found that the ATC command submission returned 0
>> while only the ISR received:
>>      CMDQ error (cons 0x03000003): ATC invalidate timeout
>>      arm_smmu_debugfs_atc_write: ATC_INV ret=0
>>
>> It seems difficult to insert a CMDQ_OP_CFGI_STE in the submission
>> thread?
> 
> I didn't look, but I thought the CMDQ stops on the ATC invalidation,
> flags the error and the ISR NOP's the failing CMDQ entry and restarts
> it to resume the thread? Is that something else?
> 
> If so you could insert the STE flush instead of a NOP

Nope, sadly the timeout is asynchronous, and CERROR_ATC_INV_SYNC is only 
reported on the *next* CMD_SYNC - it can't even tell us which 
CMD_ATC_INV(s) had a problem. Also there is no NOP; currently the only 
command rewriting we do is for CERROR_ILL, where we turn the illegal 
command into a CMD_SYNC.

We couldn't necessarily rely on being able to rewind the hardware CONS 
pointer from a CMD_SYNC, as by that point we're likely to have observed 
it and updated llq->cons, such that other threads could move llq->prod 
forward and fill that space with new commands.

Thanks,
Robin.

> Otherwise the arm_smmu_cmdq_issue_cmdlist() can just push another CMD
> to the queue and sync, it is obviously in a context that can do that.
> 
> Jason




More information about the linux-arm-kernel mailing list