[PATCH rfcv1 7/8] iommu/arm-smmu-v3: Add arm_smmu_invs based arm_smmu_domain_inv_range()

Jason Gunthorpe jgg at nvidia.com
Wed Aug 27 11:49:23 PDT 2025


On Wed, Aug 13, 2025 at 06:25:38PM -0700, Nicolin Chen wrote:
> +void arm_smmu_domain_inv_range(struct arm_smmu_domain *smmu_domain,
> +			       unsigned long iova, size_t size,
> +			       unsigned int granule, bool leaf)
> +{
> +	struct arm_smmu_cmdq_batch cmds = {};
> +	struct arm_smmu_invs *invs;
> +	bool retried = false;
> +	size_t i;
> +
> +	/*
> +	 * An invalidation request must follow some IOPTE change and then load
> +	 * the invalidation array In the meantime, a domain attachment mutates
> +	 * the array and then stores an STE/CD asking SMMU HW to acquire those
> +	 * changed IOPTEs. In other word, these two are interdependent and can
> +	 * race.
> +	 *
> +	 * In a race, the RCU design (with its underlying memory barriers) can
> +	 * ensure the invalidations array to always get updated before loaded.
> +	 *
> +	 * smp_mb() is used here, paired with the smp_mb() following the array
> +	 * update in a concurrent attach, to ensure:
> +	 *  - HW sees the new IOPTEs if it walks after STE installation
> +	 *  - Invalidation thread sees the updated array with the new ASID.
> +	 *
> +	 *  [CPU0]                        | [CPU1]
> +	 *                                |
> +	 *  change IOPTEs and TLB flush:  |
> +	 *  arm_smmu_domain_inv_range() { | arm_smmu_install_new_domain_invs {
> +	 *    ...                         |   rcu_assign_pointer(new_invs);
> +	 *    smp_mb(); // ensure IOPTEs  |   smp_mb(); // ensure new_invs
> +	 *    ...                         |   kfree_rcu(old_invs, rcu);
> +	 *    // load invalidation array  | }
> +	 *    invs = rcu_dereference();   | arm_smmu_install_ste_for_dev {
> +	 *                                |   STE = TTB0 // read new IOPTEs
> +	 */
> +	smp_mb();
> +
> +	rcu_read_lock();
> +again:
> +	invs = rcu_dereference(smmu_domain->invs);
> +
> +	/* A concurrent attachment might have changed the array. Do a respin */
> +	if (unlikely(!read_trylock(&invs->rwlock)))
> +		goto again;
> +	/* Only one retry. Otherwise, it would soft lockup on an empty array */
> +	if (!retried && unlikely(!invs->num_invs)) {
> +		read_unlock(&invs->rwlock);
> +		retried = true;
> +		goto again;
> +	}

This has missed the point, it was to not get the unless we have
ATS. Something like this:


	rcu_read_lock();

	while (true) {
		invs = rcu_dereference(smmu_domain->invs);

		/*
		 * Avoid locking unless ATS is being used. No ATS invalidate can
		 * be going on after a domain is detached.
		 */
		locked = false;
		if (invs->has_ats || READ_ONCE(invs->old)) {
			read_lock(&invs->rwlock);
			if (invs->old) {
				read_unlock(&invs->rwlock);
				continue;
			}
			locked = true;
		}
		break;
	}

	num_invs = READ_ONCE(num_invs);
	for (i = 0; i != num_invs; i++) {

> +		case INV_TYPE_S2_VMID_S1_CLEAR:
> +			/* CMDQ_OP_TLBI_S12_VMALL already flushed S1 entries */
> +			if (arm_smmu_inv_size_too_big(cur->smmu, size, granule))
> +				continue;
> +			/* Just a single CMDQ_OP_TLBI_NH_ALL, no batching */
> +			cmd.tlbi.vmid = cur->id;
> +			arm_smmu_cmdq_issue_cmd_with_sync(cur->smmu, &cmd);

This should just stick it in the batch, the batch was already flushed:

	/* The batch for S2 TLBI must be done before nested S1 ASIDs */
	if (next->type == INV_TYPE_S2_VMID_S1_CLEAR)
		return true;

That needs a fixup too:

	/* The batch for S2 TLBI must be done before nested S1 ASIDs */
	if (cur->type != INV_TYPE_S2_VMID_S1_CLEAR &&
	    next->type == INV_TYPE_S2_VMID_S1_CLEAR)
		return true;

It makes sense to bundle all the NH_ALL into one batch if there is more
than one.

But this arm_smmu_invs_end_batch() no longer works right since the

		if (refcount_read(&cur->users) == 0)
			continue;

Was added..

Jason



More information about the linux-arm-kernel mailing list