[RFC PATCH] arm64: KVM: honor cacheability attributes on S2 page fault

Catalin Marinas catalin.marinas at arm.com
Fri Oct 11 08:38:40 EDT 2013


On Thu, Oct 10, 2013 at 05:09:03PM +0100, Anup Patel wrote:
> On Thu, Oct 10, 2013 at 4:54 PM, Catalin Marinas
> <catalin.marinas at arm.com> wrote:
> > On Thu, Oct 10, 2013 at 09:39:55AM +0100, Marc Zyngier wrote:
> >> On 10/10/13 05:51, Anup Patel wrote:
> >> > Are you planning to go ahead with this approach ?
> >>
> >> [adding Catalin, as we heavily discussed this recently]
> >>
> >> Not as such, as it doesn't solve the full issue. It merely papers over
> >> the whole "my cache is off" problem. More specifically, any kind of
> >> speculative access from another CPU while caches are off in the guest
> >> completely nukes the benefit of this patch.
> >>
> >> Also, turning the the caches off is another source of problems, as
> >> speculation also screws up set/way invalidation.
> >
> > Indeed. The set/way operations trapping and broadcasting (or deferring)
> > to other CPUs in software just happens to work but there is no
> > guarantee, sooner or later we'll hit a problem. I'm even tempted to
> > remove flush_dcache_all() calls on the booting path for the arm64
> > kernel, we already require that whatever runs before Linux should
> > clean&invalidate the caches.
> >
> > Basically, with KVM a VCPU even if running with caches/MMU disabled can
> > still get speculative allocation into the cache. The reason for this is
> > the other cacheable memory aliases created by the host kernel and
> > qemu/kvmtool. I can't tell whether Xen has this issue but it may be
> > easier in Xen to avoid memory aliases.
> >
> >> > We really need this patch for X-Gene L3 cache.
> >>
> >> So far, I can see two possibilities:
> >> - either we mandate caches to be always on (DC bit, and you're not
> >> allowed to turn the caches off).
> >
> > That's my preferred approach. For hotplug, idle, the guest would use an
> > HVC call (PSCI) and the host takes care of re-enabling the DC bit. But
> > we may not catch all cases (kexec probably).
> >
> >> - Or we mandate that caches are invalidated (by VA) for each write that
> >> is performed with caches off.
> >
> > For some things like run-time code patching, on ARMv8 we need to do at
> > least I-cache maintenance since the CPU can allocate into the I-cache
> > (even if there are no aliases).
> 
> It seems all approaches considered so far have a corner case in
> one-way or another.

Yes, we try to settle on the one with least corner cases.

> Coming back to where we started, the actual problem was that when
> Guest starts booting it sees wrong contents because it is runs with
> MMU disable and correct contents are still in external L3 cache of X-Gene.

That's one of the problems and I think the easiest to solve. Note that
contents could still be in the L1/L2 (inner) cache since whole cache
flushing by set/way isn't guaranteed in an MP context.

> How about reconsidering the approach of flushing Guest RAM (entire or
> portion of it) to PoC by VA once before the first run of a VCPU ?

Flushing the entire guest RAM is not possible by set/way
(architecturally) and not efficient by VA (though some benchmark would
be good). Marc's patch defers this flushing when a page is faulted in
(at stage 2) and I think it covers the initial boot.

> OR
> We can also have KVM API using which user space can flush portions
> of Guest RAM before running the VCPU. (I think this was a suggestion
> from Marc Z initially)

This may not be enough. It indeed flushes the kernel image that gets
loaded but the kernel would write other pages (bss, page tables etc.)
with MMU disabled and those addresses may contain dirty cache lines that
have not been covered by the initial kvmtool flush. So you basically
need all guest non-cacheable accesses to be flushed.

The other problems are the cacheable aliases that I mentioned, so even
though the guest does non-cacheable accesses with the MMU off, the
hardware can still allocate into the cache via the other mappings. In
this case the guest needs to invalidate the areas of memory that it
wrote with caches off (or just use the DC bit to force memory accesses
with MMU off to be cacheable).

-- 
Catalin



More information about the linux-arm-kernel mailing list