[PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.

Catalin Marinas catalin.marinas at arm.com
Fri Aug 30 05:44:43 EDT 2013


On Thu, Aug 29, 2013 at 05:02:50PM +0100, Anup Patel wrote:
> On Thu, Aug 29, 2013 at 6:23 PM, Catalin Marinas
> <catalin.marinas at arm.com> wrote:
> > On Thu, Aug 29, 2013 at 01:31:43PM +0100, Anup Patel wrote:
> >> On Thu, Aug 29, 2013 at 4:22 PM, Catalin Marinas
> >> <catalin.marinas at arm.com> wrote:
> >> > On Fri, Aug 16, 2013 at 07:57:55AM +0100, Anup Patel wrote:
> >> >> The approach of flushing d-cache by set/way upon first run of VCPU will
> >> >> not work because for set/way operations ARM ARM says: "For set/way
> >> >> operations, and for All (entire cache) operations, the point is defined to be
> >> >> to the next level of caching". In other words, set/way operations work upto
> >> >> point of unification.
> >> >
> >> > I don't understand where you got the idea that set/way operations work
> >> > up to the point of unification. This is incorrect, the set/way
> >> > operations work on the level of cache specified by bits 3:1 in the
> >> > register passed to the DC CISW instruction. For your L3 cache, those
> >> > bits would be 2 (and __flush_dcache_all() implementation does this
> >> > dynamically).
> >>
> >> The L3-cache is not visible to CPU. It is totally independent and transparent
> >> to CPU.
> >
> > OK. But you say that operations like DC CIVAC actually flush the L3? So
> > I don't see it as completely transparent to the CPU.
> 
> It is transparent from CPU perspective. In other words, there is nothing in
> CPU for controlling/monitoring L3-cache.

We probably have a different understanding of "transparent". It doesn't
look to me like any more transparent than the L1 or L2 cache. Basically,
from a software perspective, it needs maintenance. Whether the CPU
explicitly asks the L3 cache for this or the L3 cache figures it on its
own based on the L1/L2 operations is irrelevant.

It would have been transparent if the software didn't need to know about
it at all, but it's not the case.

> > Do you have any configuration bits which would make the L3 completely
> > transparent like always caching even when accesses are non-cacheable and
> > DC ops to PoC ignoring it?
> 
> Actually, L3-cache monitors the types of read/write generated by CPU (i.e.
> whether the request is cacheable/non-cacheable or whether the request is
> due to DC ops to PoC, or ...).
> 
> To answer your query, there is no configuration to have L3 caching when
> accesses are non-cacheable and DC ops to PoC.

So it's an outer cache with some "improvements" to handle DC ops to PoC.
I think it was a pretty bad decision on the hardware side as we really
try to get rid of outer caches for many reasons:

1. Non-standard cache flushing operations (MMIO-based)
2. It may require cache maintenance by physical address - something
   hard to get in a guest OS (unless you virtualise L3 cache
   maintenance)
3. Are barriers like DSB propagated correctly? Does a DC op to PoC
   followed by DSB ensure that the L3 drained the cachelines to RAM?

I think point 2 isn't required because your L3 detects DC ops to PoC. I
hope point 3 is handled correctly (otherwise look how "nice" the mb()
macro on arm is to cope with L2x0).

If only 1 is left, we don't need the full outer_cache framework but it
still needs to be addressed since the assumption is that flush_cache_all
(or __flush_dcache_all) flushes all cache levels. These are not used in
generic code but are used during kernel booting, KVM and cpuidle
drivers.

> > Now, back to the idea of outer_cache framework for arm64. Does your CPU
> > have separate instructions for flushing this L3 cache?
> 
> No, CPU does not have separate instruction for flushing L3-cache. On the
> other hand, L3-cache has MMIO registers which can be use to explicitly
> flush L3-cache.

I guess you use those in your firmware or boot loader since Linux
requires clean/invalidated caches at boot (and I plan to push a patch
which removes kernel D-cache cleaning during boot to spot such problems
early). A cpuidle driver would probably need this as well.

-- 
Catalin



More information about the linux-arm-kernel mailing list