[PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2

Fri Oct 20 06:41:29 PDT 2023

On Fri, 20 Oct 2023 14:21:39 +0100,
Ryan Roberts <ryan.roberts at arm.com> wrote:
> 
> On 20/10/2023 14:02, Marc Zyngier wrote:
> > On Fri, 20 Oct 2023 13:39:47 +0100,
> > Ryan Roberts <ryan.roberts at arm.com> wrote:
> >>
> >> On 20/10/2023 09:05, Marc Zyngier wrote:
> >>> Maybe. There is something to be said about making the range rework
> >>> (decreasing scale) an independent patch, as it is a significant change
> >>> on its own. But maybe the rest of the plumbing can be grouped
> >>> together.
> >>
> >> But that's effectively the split I have now, isn't it? The first patch
> >> introduces TLBI_TTL_UNKNOWN to enable use of 0 as a ttl hint. Then the second
> >> patch reworks the range stuff. I don't quite follow what you are suggesting.
> > 
> > Not quite.
> > 
> > What I'm proposing is that you pull the scale changes in their own
> > patch, and preferably without any change to the external API (i.e. no
> > change to the signature of the helper). They any extra change, such as
> > the TTL rework can go separately.
> > 
> > So while this is similar to your existing split, I'd like to see it
> > without any churn around the calling convention. Which means turning
> > the ordering around, and making use of a static key in the various
> > helpers that need to know about LPA2.
> 
> I don't think we can embed the static key usage directly inside
> __flush_tlb_range_op() (if that's what you were suggesting), because this macro
> is used by both the kernel (for its stage 1) and the hypervisor (for stage 2).
> And the kernel doesn't support LPA2 (until Ard's work is merged). So I think
> this needs to be an argument to the macro.

I can see two outcomes here:

- either you create separate helpers that abstract the LPA2-ness for
  KVM and stick to non-LPA2 for the kernel (until Ard's series makes
  it in)

- or you leave the whole thing disabled until we have full LPA2
  support.

Eventually, you replace the whole extra parameter with a static key,
and nobody sees any churn.

> Or are you asking that I make the scale change universally, even if LPA2 is not
> in use? I could do that as its own change change (which I could benchmark), then
> add the rest in a separate change. But my thinking was that we would not want to
> change the algorithm for !LAP2 since it is not as effcient (due to the LPA2 64K
> alignment requirement).

I'm all for simplicity. If having an extra 15 potential TLBIs is
acceptable from a performance perspective, I won't complain. But I can
imagine that NV would be suffering from that (TLBIs on S2 have to
trap).

	M.

-- 
Without deviation from the norm, progress is not possible.