[PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.

Marc Zyngier marc.zyngier at arm.com
Thu Aug 15 11:37:27 EDT 2013


On 2013-08-15 16:13, Anup Patel wrote:
> Hi Marc,
>
> On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier at arm.com> 
> wrote:
>> On 2013-08-15 14:31, Anup Patel wrote:
>>>
>>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier 
>>> <marc.zyngier at arm.com>
>>> wrote:
>>>>
>>>> On 2013-08-15 07:26, Anup Patel wrote:
>>>>>
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier 
>>>>> <marc.zyngier at arm.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Anup,
>>>>>>
>>>>>>
>>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier 
>>>>>>> <marc.zyngier at arm.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Pranav,
>>>>>>>>
>>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Systems with large external L3-cache (few MBs), might have 
>>>>>>>>> dirty
>>>>>>>>> content belonging to the guest page in L3-cache. To tackle 
>>>>>>>>> this,
>>>>>>>>> we need to flush such dirty content from d-cache so that 
>>>>>>>>> guest
>>>>>>>>> will see correct contents of guest page when guest MMU is 
>>>>>>>>> disabled.
>>>>>>>>>
>>>>>>>>> The patch fixes coherent_icache_guest_page() for external 
>>>>>>>>> L3-cache.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar 
>>>>>>>>> <pranavkumar at linaro.org>
>>>>>>>>> Signed-off-by: Anup Patel <anup.patel at linaro.org>
>>>>>>>>> ---
>>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>> index efe609c..5129038 100644
>>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>>>>>>> +             /* Flush d-cache for systems with external 
>>>>>>>>> caches. */
>>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non 
>>>>>>>>> ASID-tagged
>>>>>>>>> VIVT
>>>>>>>>> */
>>>>>>>>>               /* any kind of VIPT cache */
>>>>>>>>>               __flush_icache_all();
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [adding Will to the discussion as we talked about this in the 
>>>>>>>> past]
>>>>>>>>
>>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit 
>>>>>>>> the
>>>>>>>> data
>>>>>>>> cache on each page mapping. It looks overkill.
>>>>>>>>
>>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>>>>>>>> kvmtools
>>>>>>>> knows which bits of the guest memory have been touched, and 
>>>>>>>> can do a
>>>>>>>> "DC
>>>>>>>> DVAC" on this region.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It seems a bit unnatural to have cache cleaning is user-space. 
>>>>>>> I am
>>>>>>> sure
>>>>>>> other architectures don't have such cache cleaning in 
>>>>>>> user-space for
>>>>>>> KVM.
>>>>>>>
>>>>>>>>
>>>>>>>> The alternative is do it in the kernel before running any vcpu 
>>>>>>>> - but
>>>>>>>> that's not very nice either (have to clean the whole of the 
>>>>>>>> guest
>>>>>>>> memory, which makes a full dcache clean more appealing).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Actually, cleaning full d-cache by set/way upon first run of 
>>>>>>> VCPU was
>>>>>>> our second option but current approach seemed very simple hence
>>>>>>> we went for this.
>>>>>>>
>>>>>>> If more people vote for full d-cache clean upon first run of 
>>>>>>> VCPU then
>>>>>>> we should revise this patch.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can you please give the attached patch a spin on your HW? I've
>>>>>> boot-tested
>>>>>> it on a model, but of course I can't really verify that it fixes 
>>>>>> your
>>>>>> issue.
>>>>>>
>>>>>> As far as I can see, it should fix it without any additional 
>>>>>> flushing.
>>>>>>
>>>>>> Please let me know how it goes.
>>>>>
>>>>>
>>>>>
>>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>>>>> treated as "Normal Non-shareable, Inner Write-Back 
>>>>> Write-Allocate,
>>>>> Outer Write-Back Write-Allocate memory"
>>>>>
>>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>>>>> treated as "Strongly-ordered device memory"
>>>>>
>>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>>>>> accessed as normal memory when HCR_EL2.DC=1.
>>>>
>>>>
>>>>
>>>> I don't think so. Stage-2 still applies, and should force MMIO to 
>>>> be
>>>> accessed as device memory.
>>>>
>>>>
>>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>>>>> KVM ARM64 because we use VGIC.
>>>>>
>>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>>>>> when Stage1 MMU is off.
>>>>
>>>>
>>>>
>>>> See above. My understanding is that HCR.DC controls the default 
>>>> output of
>>>> Stage-1, and Stage-2 overrides still apply.
>>>
>>>
>>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>>> and wanted guest to decide the memory attribute. In other words, 
>>> you
>>> did not want to enforce any memory attribute in Stage2.
>>>
>>> Please refer to this patch 
>>> https://patchwork.kernel.org/patch/2543201/
>>
>>
>> This patch has never been merged. If you carry on following the 
>> discussion,
>> you will certainly notice it was dropped for a very good reason:
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>>
>> So Stage-2 memory attributes are used, they are not going away, and 
>> they are
>> essential to the patch I sent this morning.
>>
>>
>>         M.
>> --
>> Fast, cheap, reliable. Pick two.
>
> HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM 
> is
> provided a DMA-capable device in pass-through mode. The reason being
> bootloader/firmware typically don't enable MMU and such 
> bootloader/firmware
> will programme a pass-through DMA-capable device without any flushes 
> to
> guest RAM (because it has not enabled MMU).
>
> A good example of such a device would be SATA AHCI controller given 
> to a
> Guest/VM as direct access (using SystemMMU) and Guest 
> bootloader/firmware
> accessing this SATA AHCI controller to load kernel images from SATA 
> disk.
> In this situation, we will have to hack Guest bootloader/firmware
> AHCI driver to
> explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).

OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of 
curiosity: is that a made up example or something you actually have?

Back to square one:
Can you please benchmark the various cache cleaning options (global at 
startup time, per-page on S2 translation fault, and user-space)?

Thanks,

         M.
-- 
Fast, cheap, reliable. Pick two.



More information about the linux-arm-kernel mailing list