[PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.

Anup Patel anup at brainfault.org
Fri Aug 16 01:02:30 EDT 2013


On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
<christoffer.dall at linaro.org> wrote:
> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
>> On 2013-08-15 16:13, Anup Patel wrote:
>> > Hi Marc,
>> >
>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier at arm.com>
>> > wrote:
>> >> On 2013-08-15 14:31, Anup Patel wrote:
>> >>>
>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
>> >>> <marc.zyngier at arm.com>
>> >>> wrote:
>> >>>>
>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
>> >>>>>
>> >>>>>
>> >>>>> Hi Marc,
>> >>>>>
>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
>> >>>>> <marc.zyngier at arm.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>> Hi Anup,
>> >>>>>>
>> >>>>>>
>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
>> >>>>>>> <marc.zyngier at arm.com>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Hi Pranav,
>> >>>>>>>>
>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
>> >>>>>>>>> dirty
>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
>> >>>>>>>>> this,
>> >>>>>>>>> we need to flush such dirty content from d-cache so that
>> >>>>>>>>> guest
>> >>>>>>>>> will see correct contents of guest page when guest MMU is
>> >>>>>>>>> disabled.
>> >>>>>>>>>
>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
>> >>>>>>>>> L3-cache.
>> >>>>>>>>>
>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
>> >>>>>>>>> <pranavkumar at linaro.org>
>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel at linaro.org>
>> >>>>>>>>> ---
>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>> >>>>>>>>>  1 file changed, 2 insertions(+)
>> >>>>>>>>>
>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
>> >>>>>>>>> index efe609c..5129038 100644
>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>> >>>>>>>>> +             /* Flush d-cache for systems with external
>> >>>>>>>>> caches. */
>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
>> >>>>>>>>> ASID-tagged
>> >>>>>>>>> VIVT
>> >>>>>>>>> */
>> >>>>>>>>>               /* any kind of VIPT cache */
>> >>>>>>>>>               __flush_icache_all();
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> [adding Will to the discussion as we talked about this in the
>> >>>>>>>> past]
>> >>>>>>>>
>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
>> >>>>>>>> the
>> >>>>>>>> data
>> >>>>>>>> cache on each page mapping. It looks overkill.
>> >>>>>>>>
>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
>> >>>>>>>> kvmtools
>> >>>>>>>> knows which bits of the guest memory have been touched, and
>> >>>>>>>> can do a
>> >>>>>>>> "DC
>> >>>>>>>> DVAC" on this region.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
>> >>>>>>> I am
>> >>>>>>> sure
>> >>>>>>> other architectures don't have such cache cleaning in
>> >>>>>>> user-space for
>> >>>>>>> KVM.
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
>> >>>>>>>> - but
>> >>>>>>>> that's not very nice either (have to clean the whole of the
>> >>>>>>>> guest
>> >>>>>>>> memory, which makes a full dcache clean more appealing).
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
>> >>>>>>> VCPU was
>> >>>>>>> our second option but current approach seemed very simple hence
>> >>>>>>> we went for this.
>> >>>>>>>
>> >>>>>>> If more people vote for full d-cache clean upon first run of
>> >>>>>>> VCPU then
>> >>>>>>> we should revise this patch.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Can you please give the attached patch a spin on your HW? I've
>> >>>>>> boot-tested
>> >>>>>> it on a model, but of course I can't really verify that it fixes
>> >>>>>> your
>> >>>>>> issue.
>> >>>>>>
>> >>>>>> As far as I can see, it should fix it without any additional
>> >>>>>> flushing.
>> >>>>>>
>> >>>>>> Please let me know how it goes.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
>> >>>>> Write-Allocate,
>> >>>>> Outer Write-Back Write-Allocate memory"
>> >>>>>
>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
>> >>>>> treated as "Strongly-ordered device memory"
>> >>>>>
>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
>> >>>>
>> >>>>
>> >>>>
>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
>> >>>> be
>> >>>> accessed as device memory.
>> >>>>
>> >>>>
>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
>> >>>>> KVM ARM64 because we use VGIC.
>> >>>>>
>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
>> >>>>> when Stage1 MMU is off.
>> >>>>
>> >>>>
>> >>>>
>> >>>> See above. My understanding is that HCR.DC controls the default
>> >>>> output of
>> >>>> Stage-1, and Stage-2 overrides still apply.
>> >>>
>> >>>
>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
>> >>> and wanted guest to decide the memory attribute. In other words,
>> >>> you
>> >>> did not want to enforce any memory attribute in Stage2.
>> >>>
>> >>> Please refer to this patch
>> >>> https://patchwork.kernel.org/patch/2543201/
>> >>
>> >>
>> >> This patch has never been merged. If you carry on following the
>> >> discussion,
>> >> you will certainly notice it was dropped for a very good reason:
>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
>> >>
>> >> So Stage-2 memory attributes are used, they are not going away, and
>> >> they are
>> >> essential to the patch I sent this morning.
>> >>
>> >>
>> >>         M.
>> >> --
>> >> Fast, cheap, reliable. Pick two.
>> >
>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
>> > is
>> > provided a DMA-capable device in pass-through mode. The reason being
>> > bootloader/firmware typically don't enable MMU and such
>> > bootloader/firmware
>> > will programme a pass-through DMA-capable device without any flushes
>> > to
>> > guest RAM (because it has not enabled MMU).
>> >
>> > A good example of such a device would be SATA AHCI controller given
>> > to a
>> > Guest/VM as direct access (using SystemMMU) and Guest
>> > bootloader/firmware
>> > accessing this SATA AHCI controller to load kernel images from SATA
>> > disk.
>> > In this situation, we will have to hack Guest bootloader/firmware
>> > AHCI driver to
>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
>>
>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
>> curiosity: is that a made up example or something you actually have?
>>
>> Back to square one:
>> Can you please benchmark the various cache cleaning options (global at
>> startup time, per-page on S2 translation fault, and user-space)?
>>
> Eh, why is this a more valid argument than the vgic?  The device
> passthrough Stage-2 mappings would still have the Stage-2 memory
> attributes to configure that memory region as device memory.  Why is it
> relevant if the device is DMA-capable in this context?
>
> -Christoffer

Most ARM bootloader/firmware run with MMU off hence, they will not do
explicit cache flushes when programming DMA-capable device. Now If
HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
will not be visible to DMA-capable device.

--Anup



More information about the linux-arm-kernel mailing list