[PATCH] arm/arm64: KVM: Perform local TLB invalidation when multiplexing vcpus on a single CPU

Thu Oct 27 03:40:00 PDT 2016

On 27/10/16 11:04, Christoffer Dall wrote:
> On Thu, Oct 27, 2016 at 10:49:00AM +0100, Marc Zyngier wrote:
>> Hi Christoffer,
>>
>> On 27/10/16 10:19, Christoffer Dall wrote:
>>> On Mon, Oct 24, 2016 at 04:31:28PM +0100, Marc Zyngier wrote:
>>>> Architecturally, TLBs are private to the (physical) CPU they're
>>>> associated with. But when multiple vcpus from the same VM are
>>>> being multiplexed on the same CPU, the TLBs are not private
>>>> to the vcpus (and are actually shared across the VMID).
>>>>
>>>> Let's consider the following scenario:
>>>>
>>>> - vcpu-0 maps PA to VA
>>>> - vcpu-1 maps PA' to VA
>>>>
>>>> If run on the same physical CPU, vcpu-1 can hit TLB entries generated
>>>> by vcpu-0 accesses, and access the wrong physical page.
>>>>
>>>> The solution to this is to keep a per-VM map of which vcpu ran last
>>>> on each given physical CPU, and invalidate local TLBs when switching
>>>> to a different vcpu from the same VM.
>>>
>>> Just making sure I understand this:  The reason you cannot rely on the
>>> guest doing the necessary distinction with ASIDs or invalidating the TLB
>>> is that a guest (which assumes it's running on hardware) can validly
>>> defer any neccessary invalidation until it starts running on other
>>> physical CPUs, but we do this transparently in KVM?
>>
>> The guest wouldn't have to do any invalidation at all on real HW,
>> because the TLBs are strictly private to a physical CPU (only the
>> invalidation can be broadcast to the Inner Shareable domain). But when
>> we multiplex two vcpus on the same physical CPU, we break the private
>> semantics, and a vcpu could hit in the TLB entries populated by the
>> another one.
> 
> Such a guest would be using a mapping of the same VA with the same ASID
> on two separate CPUs, each pointing to a separate PA.  If it ever were
> to, say, migrate a task, it would have to do invalidations then.  Right?

This doesn't have to be ASID tagged. Actually, it is more likely to
affect global mappings. Imagine for example that the kernel (which uses
global mappings for its own page tables) decides to create per-cpu
variable using this trick (all the CPUs have the same VA, but use
different PAs). No invalidation at all, everything looks perfectly fine,
until you start virtualizing it.

> Does Linux or other guests actually do this?

Linux may hit it with CPU hotplug, which uses global mappings (which a
vcpu using an ASID tagged mapping could then hit if the VAs overlap).

> 
> I would suspect Linux has to eventually invalidate those mappins if it
> wants the scheduler to be allowed to freely move things around.
> 
>>
>> As we cannot segregate the TLB entries per vcpu (but only per VMID), the
>> workaround is to nuke all the TLBs for this VMID (locally only - no
>> broadcast) each time we find that two vcpus are sharing the same
>> physical CPU.
>>
>> Is that clearer?
> 
> Yes, the fix is clear, just want to make sure I understand that it's a
> valid circumstance where this actually happens.  But in either case, we
> probably have to fix this to emulate the hardware correctly.
> 
> Another fix would be to allocate a VMID per VCPU I suppose, just to
> introduce a terrible TLB hit ratio :)

But that would break TLB invalidations that are broadcast in the Inner
Shareable domain. To do so, you'd have to trap every TBLI, and issue
corresponding invalidations for all the vcpus. I'm not sure I want to
see the performance number of that solution... ;-)

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...