[PATCH] arm/arm64: KVM: Perform local TLB invalidation when multiplexing vcpus on a single CPU

Thu Oct 27 05:27:57 PDT 2016

On Thu, Oct 27, 2016 at 11:40:00AM +0100, Marc Zyngier wrote:
> On 27/10/16 11:04, Christoffer Dall wrote:
> > On Thu, Oct 27, 2016 at 10:49:00AM +0100, Marc Zyngier wrote:
> >> Hi Christoffer,
> >>
> >> On 27/10/16 10:19, Christoffer Dall wrote:
> >>> On Mon, Oct 24, 2016 at 04:31:28PM +0100, Marc Zyngier wrote:
> >>>> Architecturally, TLBs are private to the (physical) CPU they're
> >>>> associated with. But when multiple vcpus from the same VM are
> >>>> being multiplexed on the same CPU, the TLBs are not private
> >>>> to the vcpus (and are actually shared across the VMID).
> >>>>
> >>>> Let's consider the following scenario:
> >>>>
> >>>> - vcpu-0 maps PA to VA
> >>>> - vcpu-1 maps PA' to VA
> >>>>
> >>>> If run on the same physical CPU, vcpu-1 can hit TLB entries generated
> >>>> by vcpu-0 accesses, and access the wrong physical page.
> >>>>
> >>>> The solution to this is to keep a per-VM map of which vcpu ran last
> >>>> on each given physical CPU, and invalidate local TLBs when switching
> >>>> to a different vcpu from the same VM.
> >>>
> >>> Just making sure I understand this:  The reason you cannot rely on the
> >>> guest doing the necessary distinction with ASIDs or invalidating the TLB
> >>> is that a guest (which assumes it's running on hardware) can validly
> >>> defer any neccessary invalidation until it starts running on other
> >>> physical CPUs, but we do this transparently in KVM?
> >>
> >> The guest wouldn't have to do any invalidation at all on real HW,
> >> because the TLBs are strictly private to a physical CPU (only the
> >> invalidation can be broadcast to the Inner Shareable domain). But when
> >> we multiplex two vcpus on the same physical CPU, we break the private
> >> semantics, and a vcpu could hit in the TLB entries populated by the
> >> another one.
> > 
> > Such a guest would be using a mapping of the same VA with the same ASID
> > on two separate CPUs, each pointing to a separate PA.  If it ever were
> > to, say, migrate a task, it would have to do invalidations then.  Right?
> 
> This doesn't have to be ASID tagged. Actually, it is more likely to
> affect global mappings. Imagine for example that the kernel (which uses
> global mappings for its own page tables) decides to create per-cpu
> variable using this trick (all the CPUs have the same VA, but use
> different PAs). No invalidation at all, everything looks perfectly fine,
> until you start virtualizing it.
> 
> > Does Linux or other guests actually do this?
> 
> Linux may hit it with CPU hotplug, which uses global mappings (which a
> vcpu using an ASID tagged mapping could then hit if the VAs overlap).
> 

Right, ok, it's more threatening than I first thought.  Thanks for the
explanation.

> > 
> > I would suspect Linux has to eventually invalidate those mappins if it
> > wants the scheduler to be allowed to freely move things around.
> > 
> >>
> >> As we cannot segregate the TLB entries per vcpu (but only per VMID), the
> >> workaround is to nuke all the TLBs for this VMID (locally only - no
> >> broadcast) each time we find that two vcpus are sharing the same
> >> physical CPU.
> >>
> >> Is that clearer?
> > 
> > Yes, the fix is clear, just want to make sure I understand that it's a
> > valid circumstance where this actually happens.  But in either case, we
> > probably have to fix this to emulate the hardware correctly.
> > 
> > Another fix would be to allocate a VMID per VCPU I suppose, just to
> > introduce a terrible TLB hit ratio :)
> 
> But that would break TLB invalidations that are broadcast in the Inner
> Shareable domain. To do so, you'd have to trap every TBLI, and issue
> corresponding invalidations for all the vcpus. I'm not sure I want to
> see the performance number of that solution... ;-)
> 
Ah, yeah, that's ridiculous.  Forget what I said.

Thanks,
-Christoffer