[PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.

Anup Patel anup at brainfault.org
Fri Aug 16 02:57:55 EDT 2013


On Fri, Aug 16, 2013 at 10:32 AM, Anup Patel <anup at brainfault.org> wrote:
> On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> <christoffer.dall at linaro.org> wrote:
>> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
>>> On 2013-08-15 16:13, Anup Patel wrote:
>>> > Hi Marc,
>>> >
>>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier at arm.com>
>>> > wrote:
>>> >> On 2013-08-15 14:31, Anup Patel wrote:
>>> >>>
>>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
>>> >>> <marc.zyngier at arm.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
>>> >>>>>
>>> >>>>>
>>> >>>>> Hi Marc,
>>> >>>>>
>>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
>>> >>>>> <marc.zyngier at arm.com>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Hi Anup,
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
>>> >>>>>>> <marc.zyngier at arm.com>
>>> >>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> Hi Pranav,
>>> >>>>>>>>
>>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
>>> >>>>>>>>> dirty
>>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
>>> >>>>>>>>> this,
>>> >>>>>>>>> we need to flush such dirty content from d-cache so that
>>> >>>>>>>>> guest
>>> >>>>>>>>> will see correct contents of guest page when guest MMU is
>>> >>>>>>>>> disabled.
>>> >>>>>>>>>
>>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
>>> >>>>>>>>> L3-cache.
>>> >>>>>>>>>
>>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
>>> >>>>>>>>> <pranavkumar at linaro.org>
>>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel at linaro.org>
>>> >>>>>>>>> ---
>>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>> >>>>>>>>>  1 file changed, 2 insertions(+)
>>> >>>>>>>>>
>>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>> >>>>>>>>> index efe609c..5129038 100644
>>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>> >>>>>>>>> +             /* Flush d-cache for systems with external
>>> >>>>>>>>> caches. */
>>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
>>> >>>>>>>>> ASID-tagged
>>> >>>>>>>>> VIVT
>>> >>>>>>>>> */
>>> >>>>>>>>>               /* any kind of VIPT cache */
>>> >>>>>>>>>               __flush_icache_all();
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> [adding Will to the discussion as we talked about this in the
>>> >>>>>>>> past]
>>> >>>>>>>>
>>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
>>> >>>>>>>> the
>>> >>>>>>>> data
>>> >>>>>>>> cache on each page mapping. It looks overkill.
>>> >>>>>>>>
>>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>>> >>>>>>>> kvmtools
>>> >>>>>>>> knows which bits of the guest memory have been touched, and
>>> >>>>>>>> can do a
>>> >>>>>>>> "DC
>>> >>>>>>>> DVAC" on this region.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
>>> >>>>>>> I am
>>> >>>>>>> sure
>>> >>>>>>> other architectures don't have such cache cleaning in
>>> >>>>>>> user-space for
>>> >>>>>>> KVM.
>>> >>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
>>> >>>>>>>> - but
>>> >>>>>>>> that's not very nice either (have to clean the whole of the
>>> >>>>>>>> guest
>>> >>>>>>>> memory, which makes a full dcache clean more appealing).
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
>>> >>>>>>> VCPU was
>>> >>>>>>> our second option but current approach seemed very simple hence
>>> >>>>>>> we went for this.
>>> >>>>>>>
>>> >>>>>>> If more people vote for full d-cache clean upon first run of
>>> >>>>>>> VCPU then
>>> >>>>>>> we should revise this patch.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Can you please give the attached patch a spin on your HW? I've
>>> >>>>>> boot-tested
>>> >>>>>> it on a model, but of course I can't really verify that it fixes
>>> >>>>>> your
>>> >>>>>> issue.
>>> >>>>>>
>>> >>>>>> As far as I can see, it should fix it without any additional
>>> >>>>>> flushing.
>>> >>>>>>
>>> >>>>>> Please let me know how it goes.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
>>> >>>>> Write-Allocate,
>>> >>>>> Outer Write-Back Write-Allocate memory"
>>> >>>>>
>>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>>> >>>>> treated as "Strongly-ordered device memory"
>>> >>>>>
>>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
>>> >>>> be
>>> >>>> accessed as device memory.
>>> >>>>
>>> >>>>
>>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>>> >>>>> KVM ARM64 because we use VGIC.
>>> >>>>>
>>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>>> >>>>> when Stage1 MMU is off.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> See above. My understanding is that HCR.DC controls the default
>>> >>>> output of
>>> >>>> Stage-1, and Stage-2 overrides still apply.
>>> >>>
>>> >>>
>>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>>> >>> and wanted guest to decide the memory attribute. In other words,
>>> >>> you
>>> >>> did not want to enforce any memory attribute in Stage2.
>>> >>>
>>> >>> Please refer to this patch
>>> >>> https://patchwork.kernel.org/patch/2543201/
>>> >>
>>> >>
>>> >> This patch has never been merged. If you carry on following the
>>> >> discussion,
>>> >> you will certainly notice it was dropped for a very good reason:
>>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>>> >>
>>> >> So Stage-2 memory attributes are used, they are not going away, and
>>> >> they are
>>> >> essential to the patch I sent this morning.
>>> >>
>>> >>
>>> >>         M.
>>> >> --
>>> >> Fast, cheap, reliable. Pick two.
>>> >
>>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
>>> > is
>>> > provided a DMA-capable device in pass-through mode. The reason being
>>> > bootloader/firmware typically don't enable MMU and such
>>> > bootloader/firmware
>>> > will programme a pass-through DMA-capable device without any flushes
>>> > to
>>> > guest RAM (because it has not enabled MMU).
>>> >
>>> > A good example of such a device would be SATA AHCI controller given
>>> > to a
>>> > Guest/VM as direct access (using SystemMMU) and Guest
>>> > bootloader/firmware
>>> > accessing this SATA AHCI controller to load kernel images from SATA
>>> > disk.
>>> > In this situation, we will have to hack Guest bootloader/firmware
>>> > AHCI driver to
>>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
>>>
>>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
>>> curiosity: is that a made up example or something you actually have?
>>>
>>> Back to square one:
>>> Can you please benchmark the various cache cleaning options (global at
>>> startup time, per-page on S2 translation fault, and user-space)?
>>>
>> Eh, why is this a more valid argument than the vgic?  The device
>> passthrough Stage-2 mappings would still have the Stage-2 memory
>> attributes to configure that memory region as device memory.  Why is it
>> relevant if the device is DMA-capable in this context?
>>
>> -Christoffer
>
> Most ARM bootloader/firmware run with MMU off hence, they will not do
> explicit cache flushes when programming DMA-capable device. Now If
> HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> will not be visible to DMA-capable device.
>
> --Anup

The approach of flushing d-cache by set/way upon first run of VCPU will
not work because for set/way operations ARM ARM says: "For set/way
operations, and for All (entire cache) operations, the point is defined to be
to the next level of caching". In other words, set/way operations work upto
point of unification.

Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
at bootup time which does not work for us when L3-cache is enabled so,
there is no point is adding one more __flush_dcache_all() upon first run of
VCPU in KVM ARM64.

IMHO, we are left with following options:
1. Flush all RAM regions of VCPU using __flush_dcache_range()
    upon first run of VCPU
2. Implement outer-cache framework for ARM64 and flush all
    caches + outer cache (i.e. L3-cache) upon first run of VCPU
3. Use an alternate version of flush_icache_range() which will
    flush d-cache by PoC instead of PoU. We can also ensure
    that coherent_icache_guest_page() function will be called
    upon Stage2 prefetch aborts only.

Regards,
Anup



More information about the linux-arm-kernel mailing list