[PATCH v7 11/11] iommu/arm-smmu-v3: Add KUnit unit tests for Runtime PM

Nicolin Chen nicolinc at nvidia.com
Thu May 28 14:43:57 PDT 2026


On Wed, May 27, 2026 at 10:14:07PM +0000, Pranjal Shrivastava wrote:
> Introduce a kunit selftests to verify the Runtime PM elision gating,
> post-suspend elisions and progress on resumption under active
> invalidation load. Simulate concurrent HW suspension using a timer.
> Mock all HW registers and CMDQ buffers by allocating them on RAM.
> Make the mock CMDQ self-consuming to avoid hitting queue_full scenarios.
> 
> Signed-off-by: Pranjal Shrivastava <praan at google.com>

Maybe I am wrong, yet this reads like an entirely AI-generated code.
So, should it have an Assisted-by?

Documentation/process/submitting-patches.rst:637:Using Assisted-by:
Documentation/process/submitting-patches.rst:641:you need to acknowledge that use by adding an Assisted-by tag.  Failure to
Documentation/process/coding-assistants.rst:44:Contributions should include an Assisted-by tag in the following format::
Documentation/process/coding-assistants.rst:46:  Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
Documentation/process/coding-assistants.rst:59:  Assisted-by: Claude:claude-3-opus coccinelle sparse

> +struct arm_smmu_test_pm_context {
> +	u32 *mock_prod_reg;
> +	u64 *mock_base;
> +	unsigned long *mock_valid_map;
> +};

It seems all about cmdq only, so maybe:

struct arm_smmu_mock_cmdq

?

> +/*
> + * Shared helper to allocate and map a fully functional, self-consuming mock

Is it really a "Shared" helper? I see only one caller..

> + * CMDQ. By redirecting both prod_reg and cons_reg to the same memory location,
> + * every producer write is instantly reflected as a consumer completion.
> + */
> +static struct arm_smmu_test_pm_context *
> +arm_smmu_v3_test_init_mock_cmdq(struct kunit *test, struct arm_smmu_device *smmu)
> +{
> +	struct arm_smmu_test_pm_context *ctx;
> +	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
> +
> +	ctx = kunit_kzalloc(test, sizeof(*ctx), GFP_KERNEL);
> +	KUNIT_ASSERT_NOT_NULL(test, ctx);

This ctx struct isn't that large. Maybe the test function (caller)
can put on the stack and pass its point int than allocation?

> +	/* Use a standard 1024-entry queue for safe word bitmask alignment */
> +	ctx->mock_prod_reg = kunit_kzalloc(test, sizeof(u32), GFP_KERNEL);

Is a u32 allocation necessary?

> +	ctx->mock_base = kunit_kzalloc(test, 1024 * 16, GFP_KERNEL);

s/16/sizeof(struct arm_smmu_cmd)

> +	ctx->mock_valid_map = kunit_kzalloc(test, BITS_TO_LONGS(1024) * sizeof(long), GFP_KERNEL);
> +
> +	KUNIT_ASSERT_NOT_NULL(test, ctx->mock_prod_reg);
> +	KUNIT_ASSERT_NOT_NULL(test, ctx->mock_base);
> +	KUNIT_ASSERT_NOT_NULL(test, ctx->mock_valid_map);
> +
> +	smmu->features = 0;
> +	/* 1024 entries */
> +	cmdq->q.llq.max_n_shift = 10;
> +	/* SMMUv3 commands are 2 dwords (16 bytes) */
> +	cmdq->q.ent_dwords = 2;

CMDQ_ENT_DWORDS?

> +	cmdq->q.base = (__le64 *)ctx->mock_base;

ctx->mock_base doesn't seem to have any other use? Why does it have
to be in the ctx structure?

> +	cmdq->valid_map = (atomic_long_t *)ctx->mock_valid_map;

Same questions.

> +	/* Self-Consuming, prod == cons always ensures queue empty */
> +	cmdq->q.prod_reg = (__force u32 __iomem *)ctx->mock_prod_reg;
> +	cmdq->q.cons_reg = (__force u32 __iomem *)ctx->mock_prod_reg;
> +
> +	/* Initialize memory indices */
> +	atomic_set(&cmdq->q.llq.atomic.prod, 0);
> +	atomic_set(&cmdq->q.llq.atomic.cons, 0);
> +	atomic_set(&cmdq->owner_prod, 0);
> +	*ctx->mock_prod_reg = 0;

Arguably, mock_prod_reg is only used to test against a prod snapshot
so it doesn't seem necessary to be in a structure.

> +	return ctx;
> +}
> +
> +struct arm_smmu_test_timer_context {
> +	struct kunit *test;
> +	struct arm_smmu_device *smmu;
> +	struct arm_smmu_test_pm_context *pm_ctx;

pm_ctx doesn't seem to be used by the timer callback function?

Is it necessary in this structure? Even "test" isn't used.

>. +	struct timer_list timer;
> +	bool stopped;
> +};
> +
> +/* Asynchronous timer callback running in softirq context */

Would be nicer to replace this with the inline comments:

/* Simulate a concurrent runtime suspend event interrupting the invalidations */

> +static void arm_smmu_v3_test_rpm_timer_callback(struct timer_list *t)
> +{
> +	struct arm_smmu_test_timer_context *ctx =
> +		timer_container_of(ctx, t, timer);
> +	struct arm_smmu_cmdq *cmdq = &ctx->smmu->cmdq;
> +
> +	/*
> +	 * Asynchronously set the STOP_FLAG (Close the gate).
> +	 * Simulate a concurrent runtime suspend event interrupting
> +	 * the invalidations happening under active load.
> +	 */
> +	atomic_or(CMDQ_PROD_STOP_FLAG, &cmdq->q.llq.atomic.prod);
> +
> +	/* Signal to the main storm loop that suspend has triggered */
> +	WRITE_ONCE(ctx->stopped, true);

Then, it should be named "suspended".

> +}
> +
> +/*
> + * Verify SMMU PM Runtime gating, elision, and post-suspend resumption
> + * safety sequentially under active stress.
> + */
> +static void arm_smmu_v3_test_rpm_stress_race(struct kunit *test)
> +{
> +	struct arm_smmu_device *smmu = kunit_kzalloc(test, sizeof(*smmu), GFP_KERNEL);

arm-smmu-v3-test has a global arm_smmu_device to use.

> +	struct arm_smmu_test_pm_context *pm_ctx;
> +	struct arm_smmu_test_timer_context *timer_ctx;
> +	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
> +	struct arm_smmu_cmd cmd = arm_smmu_make_cmd_cfgi_all();

Can we organize this a bit? Maybe in an inverted xmas tree?

Also, it would be nicer to run git-clang-format.

> +	u32 stopped_prod;
> +	int i, ret;
> +
> +	KUNIT_ASSERT_NOT_NULL(test, smmu);
> +	pm_ctx = arm_smmu_v3_test_init_mock_cmdq(test, smmu);
> +
> +	timer_ctx = kunit_kmalloc(test, sizeof(*timer_ctx), GFP_KERNEL);
> +	KUNIT_ASSERT_NOT_NULL(test, timer_ctx);

The allocation doesn't seem necessary either. We use kunit_kzalloc
in this file only when the memory (usually an array) is too large.

> +	timer_ctx->stopped = false;

Then, should have been kunit_kzalloc() at least. Or " = {0};", if
keep it on the stack.

> +	/* Setup the kernel software timer */
> +	timer_setup(&timer_ctx->timer, arm_smmu_v3_test_rpm_timer_callback, 0);
> +
> +	/* Start the timer to fire in 10 jiffies */
> +	mod_timer(&timer_ctx->timer, jiffies + msecs_to_jiffies(10));

Both lines reads pretty clear without that verbose comments...

> +
> +	/* Execute the unmap storm until the timer triggers */
> +	while (!READ_ONCE(timer_ctx->stopped)) {
> +		ret = arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, &cmd, 1, false);
> +		if (ret != 0) {
> +			KUNIT_FAIL(test, "Unmap Storm loop failed to submit cmd list: %d\n", ret);
> +			break;
> +		}

As of now, arm_smmu_cmdq_issue_cmdlist() fails only on -ETIMEDOUT
and spits a print already. So, maybe simplify it:
		if (arm_smmu_cmdq_issue_cmdlist())
			break;
>  static struct kunit_case arm_smmu_v3_test_cases[] = {
> +	KUNIT_CASE(arm_smmu_v3_test_rpm_stress_race),
>  	KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_abort),
>  	KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_bypass),
>  	KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_abort),

Usually we append to the end. And following the naming convention:
	arm_smmu_v3_rpm_test_stress_race

Nicolin



More information about the linux-arm-kernel mailing list