[PATCH v2 7/7] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout

Nicolin Chen nicolinc at nvidia.com
Wed Mar 18 20:26:42 PDT 2026


On Thu, Mar 19, 2026 at 10:56:43AM +0800, Shuai Xue wrote:
> On 3/18/26 3:15 AM, Nicolin Chen wrote:
> > For batched ATC_INV commands, SMMU hardware only reports a timeout at the
> > CMD_SYNC, which could follow the batch issued for multiple devices. So, it
> > isn't straightforward to identify which command in a batch resulted in the
> > timeout. Fortunately, the invs array has a sorted list of ATC entries. So,
> > the issued batch must be sorted as well. This makes it possible to bisect
> > the batch to retry the command per Stream ID and identify the master.
> 
> Nit: The implementation is a linear per-SID retry, not a binary
> search / bisection. Suggest rewording to:
> 
>   "retry the ATC_INV command for each unique Stream ID in the batch
>    to identify the unresponsive master"

You are right. And that sounds OK.

> > +	step = arm_smmu_get_step_for_sid(smmu, sid);
> > +	WRITE_ONCE(step->data[1],
> > +		   READ_ONCE(step->data[1]) & cpu_to_le64(~STRTAB_STE_1_EATS));
> 
> 
> This non-atomic read-modify-write on step->data[1] can race with the
> normal STE installation path (arm_smmu_write_entry → entry_set →
> WRITE_ONCE).
> 
> The error path runs from:
> 
>   __arm_smmu_domain_inv_range()  (data path, no group->mutex)
>     → arm_smmu_cmdq_batch_retry()
>       → arm_smmu_master_disable_ats()
>         → arm_smmu_disable_eats_for_sid()   ← NO locks on STE
> 
> The normal STE path runs from:
> 
>   iommu_attach_device()
>     → mutex_lock(&group->mutex)
>       → arm_smmu_attach_dev()
>         → mutex_lock(&arm_smmu_asid_lock)
>           → arm_smmu_install_ste_for_dev()
>             → arm_smmu_write_entry()        ← holds both mutexes
> 
> Since the error path holds neither group->mutex nor arm_smmu_asid_lock,
> the following race is possible:

Because invalidations can be in atomic context so we can't hold
those mutex locks.

>   CPU A (error path):             CPU B (attach path):
>   READ data[1] = X
>                                   WRITE data[1] = Y (new STE config)
>   WRITE data[1] = X & ~EATS
>                                   // Y is lost
> 
> This could clobber a concurrent STE update from the attach path.

Oh, that's true. Maybe this:
	__le64 new, old = READ_ONCE(step->data[1]);
	[...]
	do {
		new = old & cpu_to_le64(~STRTAB_STE_1_EATS);
	} while (!try_cmpxchg64(&step->data[1], &old, new));
?

Thanks!
Nicolin



More information about the linux-arm-kernel mailing list