[RFC PATCH 27/45] KVM: arm64: smmu-v3: Setup domains and page table configuration

Jean-Philippe Brucker jean-philippe at linaro.org
Mon Feb 26 06:18:39 PST 2024


On Fri, Feb 16, 2024 at 12:11:48PM +0000, Mostafa Saleh wrote:
> On Tue, Jan 23, 2024 at 7:50 PM Jean-Philippe Brucker
> <jean-philippe at linaro.org> wrote:
> >
> > On Mon, Jan 15, 2024 at 02:34:12PM +0000, Mostafa Saleh wrote:
> > > > +static void smmu_tlb_inv_range(struct kvm_iommu_tlb_cookie *data,
> > > > +                              unsigned long iova, size_t size, size_t granule,
> > > > +                              bool leaf)
> > > > +{
> > > > +       struct hyp_arm_smmu_v3_device *smmu = to_smmu(data->iommu);
> > > > +       unsigned long end = iova + size;
> > > > +       struct arm_smmu_cmdq_ent cmd = {
> > > > +               .opcode = CMDQ_OP_TLBI_S2_IPA,
> > > > +               .tlbi.vmid = data->domain_id,
> > > > +               .tlbi.leaf = leaf,
> > > > +       };
> > > > +
> > > > +       /*
> > > > +        * There are no mappings at high addresses since we don't use TTB1, so
> > > > +        * no overflow possible.
> > > > +        */
> > > > +       BUG_ON(end < iova);
> > > > +
> > > > +       while (iova < end) {
> > > > +               cmd.tlbi.addr = iova;
> > > > +               WARN_ON(smmu_send_cmd(smmu, &cmd));
> > >
> > > This would issue a sync command between each range, which is not needed,
> > > maybe we can build the command first and then issue the sync, similar
> > > to what the upstream driver does, what do you think?
> >
> > Yes, moving the sync out of the loop would be better. To keep things
> > simple I'd just replace this with smmu_add_cmd() and add a smmu_sync_cmd()
> > at the end, but maybe some implementations won't consume the TLBI itself
> > fast enough, and we need to build a command list in software. Do you think
> > smmu_add_cmd() is sufficient here?
> 
> Replacing this with smmu_add_cmd makes sense.
> We only poll the queue at SYNC, which is the last command, so it
> doesn't matter the pace
> of the TLBI consumption I believe?

Yes only smmu_sync_cmd() waits for consumption (unless the queue is full
when we attempt to add a cmd). And submitting the TLBIs early could allow
the hardware to do some processing while we prepare the next commands, but
I don't know if it actually works that way.

> 
> One advantage of building the command list first, is that we also
> avoid MMIO access for the queue which can be slow.

Yes, I'm curious about the overhead of MMIO on some of these platforms.
Maybe we should do some software batching if you're able to measure a
performance impact from reading and writing CMDQ indices, but I suspect
the map/unmap context switches completely overshadow it at the moment.

Thanks,
Jean



More information about the linux-arm-kernel mailing list