[PATCH] KVM: arm64: Ensure I-cache isolation between vcpus of a same VM

Marc Zyngier maz at kernel.org
Mon Mar 8 20:03:39 GMT 2021


Hi Alex,

On Mon, 08 Mar 2021 16:53:09 +0000,
Alexandru Elisei <alexandru.elisei at arm.com> wrote:
> 
> Hello,
> 
> It's not clear to me why this patch is needed. If one VCPU in the VM is generating
> code, is it not the software running in the VM responsible for keeping track of
> the MMU state of the other VCPUs and making sure the new code is executed
> correctly? Why should KVM get involved?
> 
> I don't see how this is different than running on bare metal (no
> hypervisor), and one CPU with the MMU on generates code that another
> CPU with the MMU off must execute.

The difference is that so far, we have always considered i-caches to
be private to each CPU. With a hypervisor that allows migration of
vcpus from one physical CPU to another, the i-cache isn't private
anymore from the perspective of the vcpus.

> 
> Some comments below.
> 
> On 3/6/21 2:15 PM, Catalin Marinas wrote:
> > On Sat, Mar 06, 2021 at 10:54:47AM +0000, Marc Zyngier wrote:
> >> On Fri, 05 Mar 2021 19:07:09 +0000,
> >> Catalin Marinas <catalin.marinas at arm.com> wrote:
> >>> On Wed, Mar 03, 2021 at 04:45:05PM +0000, Marc Zyngier wrote:
> >>>> It recently became apparent that the ARMv8 architecture has interesting
> >>>> rules regarding attributes being used when fetching instructions
> >>>> if the MMU is off at Stage-1.
> >>>>
> >>>> In this situation, the CPU is allowed to fetch from the PoC and
> >>>> allocate into the I-cache (unless the memory is mapped with
> >>>> the XN attribute at Stage-2).
> >>> Digging through the ARM ARM is hard. Do we have this behaviour with FWB
> >>> as well?
> >> The ARM ARM doesn't seem to mention FWB at all when it comes to
> >> instruction fetch, which is sort of expected as it only covers the
> >> D-side. I *think* we could sidestep this when CTR_EL0.DIC is set
> >> though, as the I-side would then snoop the D-side.
> > Not sure this helps. CTR_EL0.DIC refers to the need for maintenance to
> > PoU while the SCTLR_EL1.M == 0 causes the I-cache to fetch from PoC. I
> > don't think I-cache snooping the D-cache would happen to the PoU when
> > the S1 MMU is off.
> 
> FEAT_FWB requires that CLIDR_EL1.{LoUIS, LoUU} = {0, 0} which means
> that no dcache clean is required for instruction to data coherence
> (page D13-3086). I interpret that as saying that with FEAT_FWB,
> CTR_EL0.IDC is effectively 1, which means that dcache clean is not
> required for instruction generation, and icache invalidation is
> required only if CTR_EL0.DIC = 0 (according to B2-158).
> 
> > My reading of D4.4.4 is that when SCTLR_EL1.M == 0 both I and D accesses
> > are Normal Non-cacheable with a note in D4.4.6 that Non-cacheable
> > accesses may be held in the I-cache.
> 
> Nitpicking, but SCTLR_EL1.M == 0 and SCTLR_EL1.I == 1 means that
> instruction fetches are to Normal Cacheable, Inner and Outer
> Read-Allocate memory (ARM DDI 0487G.a, pages D5-2709 and indirectly
> at D13-3586).

I think that's the allocation in unified caches, and not necessarily
the i-cache, given that it also mention things such as "Inner
Write-Through", which makes no sense for the i-cache.

> Like you've pointed out, as mentioned in D4.4.6, it is always
> possible that instruction fetches are held in the instruction cache,
> regardless of the state of the SCTLR_EL1.M bit.

Exactly, and that's what breaks things.

> > The FWB rules on combining S1 and S2 says that Normal Non-cacheable at
> > S1 is "upgraded" to cacheable. This should happen irrespective of
> > whether the S1 MMU is on or off and should apply to both I and D
> > accesses (since it does not explicitly says). So I think we could skip
> > this IC IALLU when FWB is present.
> >
> > The same logic should apply when the VMM copies the VM text. With FWB,
> > we probably only need D-cache maintenance to PoU and only if
> > CTR_EL0.IDC==0. I haven't checked what the code currently does.
> 
> When FEAT_FWB, CTR_EL0.IDC is effectively 1 (see above), so we don't
> need a dcache clean in this case.

But that isn't what concerns me. FWB is exclusively documented in
terms of d-cache, and doesn't describe how that affects the
instruction fetch (which is why I'm reluctant to attribute any effect
to it).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list