[PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.

Anup Patel anup at brainfault.org
Thu Aug 15 11:45:28 EDT 2013


On Thu, Aug 15, 2013 at 9:07 PM, Marc Zyngier <marc.zyngier at arm.com> wrote:
> On 2013-08-15 16:13, Anup Patel wrote:
>>
>> Hi Marc,
>>
>> On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier at arm.com>
>> wrote:
>>>
>>> On 2013-08-15 14:31, Anup Patel wrote:
>>>>
>>>>
>>>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier <marc.zyngier at arm.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 2013-08-15 07:26, Anup Patel wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Marc,
>>>>>>
>>>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier <marc.zyngier at arm.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Anup,
>>>>>>>
>>>>>>>
>>>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier at arm.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Pranav,
>>>>>>>>>
>>>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Systems with large external L3-cache (few MBs), might have dirty
>>>>>>>>>> content belonging to the guest page in L3-cache. To tackle this,
>>>>>>>>>> we need to flush such dirty content from d-cache so that guest
>>>>>>>>>> will see correct contents of guest page when guest MMU is
>>>>>>>>>> disabled.
>>>>>>>>>>
>>>>>>>>>> The patch fixes coherent_icache_guest_page() for external
>>>>>>>>>> L3-cache.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar at linaro.org>
>>>>>>>>>> Signed-off-by: Anup Patel <anup.patel at linaro.org>
>>>>>>>>>> ---
>>>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>>> index efe609c..5129038 100644
>>>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>>>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>>>>>>>> +             /* Flush d-cache for systems with external caches.
>>>>>>>>>> */
>>>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged
>>>>>>>>>> VIVT
>>>>>>>>>> */
>>>>>>>>>>               /* any kind of VIPT cache */
>>>>>>>>>>               __flush_icache_all();
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [adding Will to the discussion as we talked about this in the past]
>>>>>>>>>
>>>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit the
>>>>>>>>> data
>>>>>>>>> cache on each page mapping. It looks overkill.
>>>>>>>>>
>>>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>>>>>>>>> kvmtools
>>>>>>>>> knows which bits of the guest memory have been touched, and can do
>>>>>>>>> a
>>>>>>>>> "DC
>>>>>>>>> DVAC" on this region.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> It seems a bit unnatural to have cache cleaning is user-space. I am
>>>>>>>> sure
>>>>>>>> other architectures don't have such cache cleaning in user-space for
>>>>>>>> KVM.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The alternative is do it in the kernel before running any vcpu -
>>>>>>>>> but
>>>>>>>>> that's not very nice either (have to clean the whole of the guest
>>>>>>>>> memory, which makes a full dcache clean more appealing).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Actually, cleaning full d-cache by set/way upon first run of VCPU
>>>>>>>> was
>>>>>>>> our second option but current approach seemed very simple hence
>>>>>>>> we went for this.
>>>>>>>>
>>>>>>>> If more people vote for full d-cache clean upon first run of VCPU
>>>>>>>> then
>>>>>>>> we should revise this patch.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Can you please give the attached patch a spin on your HW? I've
>>>>>>> boot-tested
>>>>>>> it on a model, but of course I can't really verify that it fixes your
>>>>>>> issue.
>>>>>>>
>>>>>>> As far as I can see, it should fix it without any additional
>>>>>>> flushing.
>>>>>>>
>>>>>>> Please let me know how it goes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>>>>>> treated as "Normal Non-shareable, Inner Write-Back Write-Allocate,
>>>>>> Outer Write-Back Write-Allocate memory"
>>>>>>
>>>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>>>>>> treated as "Strongly-ordered device memory"
>>>>>>
>>>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>>>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>>>>>> accessed as normal memory when HCR_EL2.DC=1.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I don't think so. Stage-2 still applies, and should force MMIO to be
>>>>> accessed as device memory.
>>>>>
>>>>>
>>>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>>>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>>>>>> KVM ARM64 because we use VGIC.
>>>>>>
>>>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>>>>>> when Stage1 MMU is off.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> See above. My understanding is that HCR.DC controls the default output
>>>>> of
>>>>> Stage-1, and Stage-2 overrides still apply.
>>>>
>>>>
>>>>
>>>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>>>> and wanted guest to decide the memory attribute. In other words, you
>>>> did not want to enforce any memory attribute in Stage2.
>>>>
>>>> Please refer to this patch https://patchwork.kernel.org/patch/2543201/
>>>
>>>
>>>
>>> This patch has never been merged. If you carry on following the
>>> discussion,
>>> you will certainly notice it was dropped for a very good reason:
>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>>>
>>> So Stage-2 memory attributes are used, they are not going away, and they
>>> are
>>> essential to the patch I sent this morning.
>>>
>>>
>>>         M.
>>> --
>>> Fast, cheap, reliable. Pick two.
>>
>>
>> HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM is
>> provided a DMA-capable device in pass-through mode. The reason being
>> bootloader/firmware typically don't enable MMU and such
>> bootloader/firmware
>> will programme a pass-through DMA-capable device without any flushes to
>> guest RAM (because it has not enabled MMU).
>>
>> A good example of such a device would be SATA AHCI controller given to a
>> Guest/VM as direct access (using SystemMMU) and Guest bootloader/firmware
>> accessing this SATA AHCI controller to load kernel images from SATA disk.
>> In this situation, we will have to hack Guest bootloader/firmware
>> AHCI driver to
>> explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
>
>
> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
> curiosity: is that a made up example or something you actually have?

Actually, this would be a typical use-case in embedded virtualization.
Other devices that fit this use-case would be network controllers,
security engines, or some proprietary HW not having drivers in Linux.

>
> Back to square one:
> Can you please benchmark the various cache cleaning options (global at
> startup time, per-page on S2 translation fault, and user-space)?
>
> Thanks,
>
>
>         M.
> --
> Fast, cheap, reliable. Pick two.

--Anup



More information about the linux-arm-kernel mailing list