[PATCH] arm/arm64: KVM: Perform local TLB invalidation when multiplexing vcpus on a single CPU

Thu Oct 27 03:04:28 PDT 2016

On Thu, Oct 27, 2016 at 10:49:00AM +0100, Marc Zyngier wrote:
> Hi Christoffer,
> 
> On 27/10/16 10:19, Christoffer Dall wrote:
> > On Mon, Oct 24, 2016 at 04:31:28PM +0100, Marc Zyngier wrote:
> >> Architecturally, TLBs are private to the (physical) CPU they're
> >> associated with. But when multiple vcpus from the same VM are
> >> being multiplexed on the same CPU, the TLBs are not private
> >> to the vcpus (and are actually shared across the VMID).
> >>
> >> Let's consider the following scenario:
> >>
> >> - vcpu-0 maps PA to VA
> >> - vcpu-1 maps PA' to VA
> >>
> >> If run on the same physical CPU, vcpu-1 can hit TLB entries generated
> >> by vcpu-0 accesses, and access the wrong physical page.
> >>
> >> The solution to this is to keep a per-VM map of which vcpu ran last
> >> on each given physical CPU, and invalidate local TLBs when switching
> >> to a different vcpu from the same VM.
> > 
> > Just making sure I understand this:  The reason you cannot rely on the
> > guest doing the necessary distinction with ASIDs or invalidating the TLB
> > is that a guest (which assumes it's running on hardware) can validly
> > defer any neccessary invalidation until it starts running on other
> > physical CPUs, but we do this transparently in KVM?
> 
> The guest wouldn't have to do any invalidation at all on real HW,
> because the TLBs are strictly private to a physical CPU (only the
> invalidation can be broadcast to the Inner Shareable domain). But when
> we multiplex two vcpus on the same physical CPU, we break the private
> semantics, and a vcpu could hit in the TLB entries populated by the
> another one.

Such a guest would be using a mapping of the same VA with the same ASID
on two separate CPUs, each pointing to a separate PA.  If it ever were
to, say, migrate a task, it would have to do invalidations then.  Right?

Does Linux or other guests actually do this?

I would suspect Linux has to eventually invalidate those mappins if it
wants the scheduler to be allowed to freely move things around.

> 
> As we cannot segregate the TLB entries per vcpu (but only per VMID), the
> workaround is to nuke all the TLBs for this VMID (locally only - no
> broadcast) each time we find that two vcpus are sharing the same
> physical CPU.
> 
> Is that clearer?

Yes, the fix is clear, just want to make sure I understand that it's a
valid circumstance where this actually happens.  But in either case, we
probably have to fix this to emulate the hardware correctly.

Another fix would be to allocate a VMID per VCPU I suppose, just to
introduce a terrible TLB hit ratio :)

Thanks,
-Christoffer