[PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.

Christoffer Dall christoffer.dall at linaro.org
Fri Aug 16 13:14:49 EDT 2013


On Fri, Aug 16, 2013 at 10:32:30AM +0530, Anup Patel wrote:
> On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> <christoffer.dall at linaro.org> wrote:
> > On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
> >> On 2013-08-15 16:13, Anup Patel wrote:
> >> > Hi Marc,
> >> >
> >> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier at arm.com>
> >> > wrote:
> >> >> On 2013-08-15 14:31, Anup Patel wrote:
> >> >>>
> >> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
> >> >>> <marc.zyngier at arm.com>
> >> >>> wrote:
> >> >>>>
> >> >>>> On 2013-08-15 07:26, Anup Patel wrote:
> >> >>>>>
> >> >>>>>
> >> >>>>> Hi Marc,
> >> >>>>>
> >> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
> >> >>>>> <marc.zyngier at arm.com>
> >> >>>>> wrote:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Hi Anup,
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
> >> >>>>>>> <marc.zyngier at arm.com>
> >> >>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> Hi Pranav,
> >> >>>>>>>>
> >> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
> >> >>>>>>>>> dirty
> >> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
> >> >>>>>>>>> this,
> >> >>>>>>>>> we need to flush such dirty content from d-cache so that
> >> >>>>>>>>> guest
> >> >>>>>>>>> will see correct contents of guest page when guest MMU is
> >> >>>>>>>>> disabled.
> >> >>>>>>>>>
> >> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
> >> >>>>>>>>> L3-cache.
> >> >>>>>>>>>
> >> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
> >> >>>>>>>>> <pranavkumar at linaro.org>
> >> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel at linaro.org>
> >> >>>>>>>>> ---
> >> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
> >> >>>>>>>>>  1 file changed, 2 insertions(+)
> >> >>>>>>>>>
> >> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> >> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
> >> >>>>>>>>> index efe609c..5129038 100644
> >> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
> >> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> >> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
> >> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
> >> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
> >> >>>>>>>>> +             /* Flush d-cache for systems with external
> >> >>>>>>>>> caches. */
> >> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
> >> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
> >> >>>>>>>>> ASID-tagged
> >> >>>>>>>>> VIVT
> >> >>>>>>>>> */
> >> >>>>>>>>>               /* any kind of VIPT cache */
> >> >>>>>>>>>               __flush_icache_all();
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> [adding Will to the discussion as we talked about this in the
> >> >>>>>>>> past]
> >> >>>>>>>>
> >> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
> >> >>>>>>>> the
> >> >>>>>>>> data
> >> >>>>>>>> cache on each page mapping. It looks overkill.
> >> >>>>>>>>
> >> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
> >> >>>>>>>> kvmtools
> >> >>>>>>>> knows which bits of the guest memory have been touched, and
> >> >>>>>>>> can do a
> >> >>>>>>>> "DC
> >> >>>>>>>> DVAC" on this region.
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
> >> >>>>>>> I am
> >> >>>>>>> sure
> >> >>>>>>> other architectures don't have such cache cleaning in
> >> >>>>>>> user-space for
> >> >>>>>>> KVM.
> >> >>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> The alternative is do it in the kernel before running any vcpu
> >> >>>>>>>> - but
> >> >>>>>>>> that's not very nice either (have to clean the whole of the
> >> >>>>>>>> guest
> >> >>>>>>>> memory, which makes a full dcache clean more appealing).
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
> >> >>>>>>> VCPU was
> >> >>>>>>> our second option but current approach seemed very simple hence
> >> >>>>>>> we went for this.
> >> >>>>>>>
> >> >>>>>>> If more people vote for full d-cache clean upon first run of
> >> >>>>>>> VCPU then
> >> >>>>>>> we should revise this patch.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Can you please give the attached patch a spin on your HW? I've
> >> >>>>>> boot-tested
> >> >>>>>> it on a model, but of course I can't really verify that it fixes
> >> >>>>>> your
> >> >>>>>> issue.
> >> >>>>>>
> >> >>>>>> As far as I can see, it should fix it without any additional
> >> >>>>>> flushing.
> >> >>>>>>
> >> >>>>>> Please let me know how it goes.
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> >> >>>>> treated as "Normal Non-shareable, Inner Write-Back
> >> >>>>> Write-Allocate,
> >> >>>>> Outer Write-Back Write-Allocate memory"
> >> >>>>>
> >> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> >> >>>>> treated as "Strongly-ordered device memory"
> >> >>>>>
> >> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
> >> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
> >> >>>>> accessed as normal memory when HCR_EL2.DC=1.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
> >> >>>> be
> >> >>>> accessed as device memory.
> >> >>>>
> >> >>>>
> >> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
> >> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
> >> >>>>> KVM ARM64 because we use VGIC.
> >> >>>>>
> >> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
> >> >>>>> when Stage1 MMU is off.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> See above. My understanding is that HCR.DC controls the default
> >> >>>> output of
> >> >>>> Stage-1, and Stage-2 overrides still apply.
> >> >>>
> >> >>>
> >> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> >> >>> and wanted guest to decide the memory attribute. In other words,
> >> >>> you
> >> >>> did not want to enforce any memory attribute in Stage2.
> >> >>>
> >> >>> Please refer to this patch
> >> >>> https://patchwork.kernel.org/patch/2543201/
> >> >>
> >> >>
> >> >> This patch has never been merged. If you carry on following the
> >> >> discussion,
> >> >> you will certainly notice it was dropped for a very good reason:
> >> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
> >> >>
> >> >> So Stage-2 memory attributes are used, they are not going away, and
> >> >> they are
> >> >> essential to the patch I sent this morning.
> >> >>
> >> >>
> >> >>         M.
> >> >> --
> >> >> Fast, cheap, reliable. Pick two.
> >> >
> >> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
> >> > is
> >> > provided a DMA-capable device in pass-through mode. The reason being
> >> > bootloader/firmware typically don't enable MMU and such
> >> > bootloader/firmware
> >> > will programme a pass-through DMA-capable device without any flushes
> >> > to
> >> > guest RAM (because it has not enabled MMU).
> >> >
> >> > A good example of such a device would be SATA AHCI controller given
> >> > to a
> >> > Guest/VM as direct access (using SystemMMU) and Guest
> >> > bootloader/firmware
> >> > accessing this SATA AHCI controller to load kernel images from SATA
> >> > disk.
> >> > In this situation, we will have to hack Guest bootloader/firmware
> >> > AHCI driver to
> >> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
> >>
> >> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
> >> curiosity: is that a made up example or something you actually have?
> >>
> >> Back to square one:
> >> Can you please benchmark the various cache cleaning options (global at
> >> startup time, per-page on S2 translation fault, and user-space)?
> >>
> > Eh, why is this a more valid argument than the vgic?  The device
> > passthrough Stage-2 mappings would still have the Stage-2 memory
> > attributes to configure that memory region as device memory.  Why is it
> > relevant if the device is DMA-capable in this context?
> >
> 
> Most ARM bootloader/firmware run with MMU off hence, they will not do
> explicit cache flushes when programming DMA-capable device. Now If
> HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> will not be visible to DMA-capable device.
>
Read my question again: The bootloaders running with the MMU off in a VM
will only disable the MMU for Stage-1 translations.  Stage-2
translations are still enforced using hte Stage-2 page tables and their
attributes for all mappings to devices will still enforce
strongly-ordered device type memory.

Again, what does it matter if the device is DMA-capable or not?

-Christoffer



More information about the linux-arm-kernel mailing list