[PATCH 2/9] iommu/arm-smmu-v3: Use the HW arm_smmu_cmd in cmdq selection functions
Jason Gunthorpe
jgg at nvidia.com
Fri May 8 08:49:50 PDT 2026
On Thu, May 07, 2026 at 09:21:28AM +0000, Mostafa Saleh wrote:
> > static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu,
> > struct arm_smmu_cmdq_batch *cmds,
> > - struct arm_smmu_cmdq_ent *cmd)
> > + struct arm_smmu_cmdq_ent *ent)
> > {
> > - bool unsupported_cmd = !arm_smmu_cmdq_supports_cmd(cmds->cmdq, cmd);
> > bool force_sync = (cmds->num == CMDQ_BATCH_ENTRIES - 1) &&
> > (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC);
> > + struct arm_smmu_cmd cmd;
> > + bool unsupported_cmd;
> > int index;
> >
> > + if (unlikely(arm_smmu_cmdq_build_cmd(cmd.data, ent))) {
> > + dev_warn(smmu->dev, "ignoring unknown CMDQ opcode 0x%x\n",
> > + ent->opcode);
> > + return;
> > + }
> > +
> > + unsupported_cmd = !arm_smmu_cmdq_supports_cmd(cmds->cmdq, &cmd);
> > if (force_sync || unsupported_cmd) {
> > arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmdq, cmds->cmds,
> > cmds->num, true);
> > - arm_smmu_cmdq_batch_init(smmu, cmds, cmd);
> > + arm_smmu_cmdq_batch_init_cmd(smmu, cmds, &cmd);
> > }
> >
> > if (cmds->num == CMDQ_BATCH_ENTRIES) {
> > arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmdq, cmds->cmds,
> > cmds->num, false);
> > - arm_smmu_cmdq_batch_init(smmu, cmds, cmd);
> > + arm_smmu_cmdq_batch_init_cmd(smmu, cmds, &cmd);
> > }
> >
> > index = cmds->num * CMDQ_ENT_DWORDS;
> > - if (unlikely(arm_smmu_cmdq_build_cmd(&cmds->cmds[index], cmd))) {
> > - dev_warn(smmu->dev, "ignoring unknown CMDQ opcode 0x%x\n",
> > - cmd->opcode);
> > - return;
> > - }
> > -
> > + memcpy(&cmds->cmds[index], cmd.data, sizeof(cmd.data));
>
> Maybe this would be better squashed with other arm_smmu_cmdq_batch
> patch to avoid this memcpy, but no strong opinion.
The memcpy has always been there, previously it was effectively inside
arm_smmu_cmdq_build_cmd(), now it is here. A later patch turns it into
a struct variable assignment which is still a memcpy.
The new thing to avoid is the arm_smmu_cmdq_build_cmd() at the top of
the function, which doesn't go away until the last patch.
This memcpy remains throughout the series since it doesn't try
directly initialize the batch in place.. Fixing that is problematic
because all the cmdq selection logic relies on an already formed
command, so we need to construct one before we even know what array
index it will land in.
Unwinding that would probably require restructing how the batch works,
which I think is probably more trouble than value. I hope to
micro-optimize the tlbi flow by removing the batch entirely
eventually. Then we'd be looking at writing the formed invalidation
command directly into the command queue (avoiding another copy on this
path), however I haven't written this and it may not work out.
Thanks,
Jason
More information about the linux-arm-kernel
mailing list