[PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts
Robin Murphy
robin.murphy at arm.com
Fri Mar 6 05:22:11 PST 2026
On 2026-03-05 11:41 pm, Jason Gunthorpe wrote:
> On Thu, Mar 05, 2026 at 01:15:45PM -0800, Nicolin Chen wrote:
>
>> You mean in arm_smmu_cmdq_issue_cmdlist() that issued the timed
>> out ATC command?
>
> Yes, it was my off hand thought.
>
>> So my test case was to trigger a device fault followed by an ATC
>> command. But, I found that the ATC command submission returned 0
>> while only the ISR received:
>> CMDQ error (cons 0x03000003): ATC invalidate timeout
>> arm_smmu_debugfs_atc_write: ATC_INV ret=0
>>
>> It seems difficult to insert a CMDQ_OP_CFGI_STE in the submission
>> thread?
>
> I didn't look, but I thought the CMDQ stops on the ATC invalidation,
> flags the error and the ISR NOP's the failing CMDQ entry and restarts
> it to resume the thread? Is that something else?
>
> If so you could insert the STE flush instead of a NOP
Nope, sadly the timeout is asynchronous, and CERROR_ATC_INV_SYNC is only
reported on the *next* CMD_SYNC - it can't even tell us which
CMD_ATC_INV(s) had a problem. Also there is no NOP; currently the only
command rewriting we do is for CERROR_ILL, where we turn the illegal
command into a CMD_SYNC.
We couldn't necessarily rely on being able to rewind the hardware CONS
pointer from a CMD_SYNC, as by that point we're likely to have observed
it and updated llq->cons, such that other threads could move llq->prod
forward and fill that space with new commands.
Thanks,
Robin.
> Otherwise the arm_smmu_cmdq_issue_cmdlist() can just push another CMD
> to the queue and sync, it is obviously in a context that can do that.
>
> Jason
More information about the linux-arm-kernel
mailing list