[PATCH v7 11/11] iommu/arm-smmu-v3: Add KUnit unit tests for Runtime PM
Nicolin Chen
nicolinc at nvidia.com
Thu May 28 14:43:57 PDT 2026
On Wed, May 27, 2026 at 10:14:07PM +0000, Pranjal Shrivastava wrote:
> Introduce a kunit selftests to verify the Runtime PM elision gating,
> post-suspend elisions and progress on resumption under active
> invalidation load. Simulate concurrent HW suspension using a timer.
> Mock all HW registers and CMDQ buffers by allocating them on RAM.
> Make the mock CMDQ self-consuming to avoid hitting queue_full scenarios.
>
> Signed-off-by: Pranjal Shrivastava <praan at google.com>
Maybe I am wrong, yet this reads like an entirely AI-generated code.
So, should it have an Assisted-by?
Documentation/process/submitting-patches.rst:637:Using Assisted-by:
Documentation/process/submitting-patches.rst:641:you need to acknowledge that use by adding an Assisted-by tag. Failure to
Documentation/process/coding-assistants.rst:44:Contributions should include an Assisted-by tag in the following format::
Documentation/process/coding-assistants.rst:46: Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
Documentation/process/coding-assistants.rst:59: Assisted-by: Claude:claude-3-opus coccinelle sparse
> +struct arm_smmu_test_pm_context {
> + u32 *mock_prod_reg;
> + u64 *mock_base;
> + unsigned long *mock_valid_map;
> +};
It seems all about cmdq only, so maybe:
struct arm_smmu_mock_cmdq
?
> +/*
> + * Shared helper to allocate and map a fully functional, self-consuming mock
Is it really a "Shared" helper? I see only one caller..
> + * CMDQ. By redirecting both prod_reg and cons_reg to the same memory location,
> + * every producer write is instantly reflected as a consumer completion.
> + */
> +static struct arm_smmu_test_pm_context *
> +arm_smmu_v3_test_init_mock_cmdq(struct kunit *test, struct arm_smmu_device *smmu)
> +{
> + struct arm_smmu_test_pm_context *ctx;
> + struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
> +
> + ctx = kunit_kzalloc(test, sizeof(*ctx), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, ctx);
This ctx struct isn't that large. Maybe the test function (caller)
can put on the stack and pass its point int than allocation?
> + /* Use a standard 1024-entry queue for safe word bitmask alignment */
> + ctx->mock_prod_reg = kunit_kzalloc(test, sizeof(u32), GFP_KERNEL);
Is a u32 allocation necessary?
> + ctx->mock_base = kunit_kzalloc(test, 1024 * 16, GFP_KERNEL);
s/16/sizeof(struct arm_smmu_cmd)
> + ctx->mock_valid_map = kunit_kzalloc(test, BITS_TO_LONGS(1024) * sizeof(long), GFP_KERNEL);
> +
> + KUNIT_ASSERT_NOT_NULL(test, ctx->mock_prod_reg);
> + KUNIT_ASSERT_NOT_NULL(test, ctx->mock_base);
> + KUNIT_ASSERT_NOT_NULL(test, ctx->mock_valid_map);
> +
> + smmu->features = 0;
> + /* 1024 entries */
> + cmdq->q.llq.max_n_shift = 10;
> + /* SMMUv3 commands are 2 dwords (16 bytes) */
> + cmdq->q.ent_dwords = 2;
CMDQ_ENT_DWORDS?
> + cmdq->q.base = (__le64 *)ctx->mock_base;
ctx->mock_base doesn't seem to have any other use? Why does it have
to be in the ctx structure?
> + cmdq->valid_map = (atomic_long_t *)ctx->mock_valid_map;
Same questions.
> + /* Self-Consuming, prod == cons always ensures queue empty */
> + cmdq->q.prod_reg = (__force u32 __iomem *)ctx->mock_prod_reg;
> + cmdq->q.cons_reg = (__force u32 __iomem *)ctx->mock_prod_reg;
> +
> + /* Initialize memory indices */
> + atomic_set(&cmdq->q.llq.atomic.prod, 0);
> + atomic_set(&cmdq->q.llq.atomic.cons, 0);
> + atomic_set(&cmdq->owner_prod, 0);
> + *ctx->mock_prod_reg = 0;
Arguably, mock_prod_reg is only used to test against a prod snapshot
so it doesn't seem necessary to be in a structure.
> + return ctx;
> +}
> +
> +struct arm_smmu_test_timer_context {
> + struct kunit *test;
> + struct arm_smmu_device *smmu;
> + struct arm_smmu_test_pm_context *pm_ctx;
pm_ctx doesn't seem to be used by the timer callback function?
Is it necessary in this structure? Even "test" isn't used.
>. + struct timer_list timer;
> + bool stopped;
> +};
> +
> +/* Asynchronous timer callback running in softirq context */
Would be nicer to replace this with the inline comments:
/* Simulate a concurrent runtime suspend event interrupting the invalidations */
> +static void arm_smmu_v3_test_rpm_timer_callback(struct timer_list *t)
> +{
> + struct arm_smmu_test_timer_context *ctx =
> + timer_container_of(ctx, t, timer);
> + struct arm_smmu_cmdq *cmdq = &ctx->smmu->cmdq;
> +
> + /*
> + * Asynchronously set the STOP_FLAG (Close the gate).
> + * Simulate a concurrent runtime suspend event interrupting
> + * the invalidations happening under active load.
> + */
> + atomic_or(CMDQ_PROD_STOP_FLAG, &cmdq->q.llq.atomic.prod);
> +
> + /* Signal to the main storm loop that suspend has triggered */
> + WRITE_ONCE(ctx->stopped, true);
Then, it should be named "suspended".
> +}
> +
> +/*
> + * Verify SMMU PM Runtime gating, elision, and post-suspend resumption
> + * safety sequentially under active stress.
> + */
> +static void arm_smmu_v3_test_rpm_stress_race(struct kunit *test)
> +{
> + struct arm_smmu_device *smmu = kunit_kzalloc(test, sizeof(*smmu), GFP_KERNEL);
arm-smmu-v3-test has a global arm_smmu_device to use.
> + struct arm_smmu_test_pm_context *pm_ctx;
> + struct arm_smmu_test_timer_context *timer_ctx;
> + struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
> + struct arm_smmu_cmd cmd = arm_smmu_make_cmd_cfgi_all();
Can we organize this a bit? Maybe in an inverted xmas tree?
Also, it would be nicer to run git-clang-format.
> + u32 stopped_prod;
> + int i, ret;
> +
> + KUNIT_ASSERT_NOT_NULL(test, smmu);
> + pm_ctx = arm_smmu_v3_test_init_mock_cmdq(test, smmu);
> +
> + timer_ctx = kunit_kmalloc(test, sizeof(*timer_ctx), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, timer_ctx);
The allocation doesn't seem necessary either. We use kunit_kzalloc
in this file only when the memory (usually an array) is too large.
> + timer_ctx->stopped = false;
Then, should have been kunit_kzalloc() at least. Or " = {0};", if
keep it on the stack.
> + /* Setup the kernel software timer */
> + timer_setup(&timer_ctx->timer, arm_smmu_v3_test_rpm_timer_callback, 0);
> +
> + /* Start the timer to fire in 10 jiffies */
> + mod_timer(&timer_ctx->timer, jiffies + msecs_to_jiffies(10));
Both lines reads pretty clear without that verbose comments...
> +
> + /* Execute the unmap storm until the timer triggers */
> + while (!READ_ONCE(timer_ctx->stopped)) {
> + ret = arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, &cmd, 1, false);
> + if (ret != 0) {
> + KUNIT_FAIL(test, "Unmap Storm loop failed to submit cmd list: %d\n", ret);
> + break;
> + }
As of now, arm_smmu_cmdq_issue_cmdlist() fails only on -ETIMEDOUT
and spits a print already. So, maybe simplify it:
if (arm_smmu_cmdq_issue_cmdlist())
break;
> static struct kunit_case arm_smmu_v3_test_cases[] = {
> + KUNIT_CASE(arm_smmu_v3_test_rpm_stress_race),
> KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_abort),
> KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_bypass),
> KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_abort),
Usually we append to the end. And following the naming convention:
arm_smmu_v3_rpm_test_stress_race
Nicolin
More information about the linux-arm-kernel
mailing list