Cache clean of page table entries

Tue Nov 9 12:36:29 EST 2010

On Mon, 2010-11-08 at 18:33 +0000, Christoffer Dall wrote:
> On Mon, Nov 8, 2010 at 7:14 PM, Catalin Marinas <catalin.marinas at arm.com> wrote:
> > On Fri, 2010-11-05 at 19:30 +0000, Christoffer Dall wrote:
> >> What happens is this:
> >>  - The guest kernel allocates memory and writes a guest page table entry.
> >
> > Which address does it use to write the page table entry?
> 
> It uses "it's own" virtual address. The guest has no knowledge of a
> host or qemu and acts as if it runs natively. Therefore, if a standard
> kernel maps its page tables at 0xc0002000, then the guest will write
> the entries using 0xc0002000. It's up to KVM to create a mapping from
> 0xc0002000 to a physical address. There will also be a mapping from
> the Qemu process' address space to that same physical address and
> possibly in the host kernel address space as well.

OK, so it may be more efficient on ARMv7 (or ARMv6 with non-aliasing
VIPT caches) to avoid extra flushing for aliases.

> > I assume at
> > this stage is the one that Qemu uses in the host OS. Does the OS make
> > any assumptions that the caches are disabled (Linux does this when
> > setting up the initial page tables)? But the memory accesses are
> > probably cacheable from the Qemu space.
> 
> Yes, the entries are always marked as cacheable. The assumptions that
> the MMU is turned off is only in the inital assembly code in head.S
> right? Once we're in start_kernel(...) and subsequently
> paging_init(...) the MMU is on and the kernel must clean the caches
> right?

Does KVM trap the cache maintenance operations that the guest kernel
does and emulate them? There is even a full D-cache flushing before the
MMU is enabled in the guest OS.

> >>  - Later, the guest tries to access the virtual address mapped through
> >> the above entry
> >>  - The driver (KVM) will have to create a corresponding mapping in
> >> it's shadow page tables (which are the ones used by the MMU). To do
> >> so, it must read the guest page table.
> >>  - Before reading the data, the user space address (which is passed to
> >> copy_from_user) is invalidated on the cache.
> >>  - From time to time, however the read returns incorrect
> >> (uninitialized or stale) data.
> >
> > This happens usually because you may have invalidated a valid cache line
> > which didn't make to RAM. You either use a flush (clean+invalidate) or
> > make sure that the corresponding cache line has been flushed by whoever
> > wrote that address. I think the former is safer.
> 
> Yes, I learned that recently by spending a lot of time debugging
> seemingly spurious bugs on the host. However, do you know how much of
> a performance difference there is between flushing and invalidating a
> clean line?

Flushing is more expensive if there are dirty cache lines since they
need to be written back and that depends on the bus and RAM speeds. But
flushing and invalidating are operations to be used in different
situations.

If cache maintenance in the guest OS for page tables is done properly
(i.e. cleaning or flushing is handled by KVM and emulated), in general
you can only do an invalidation in the host kernel before reading. If
you have non-aliasing VIPT caches, you don't even need to do this
invalidation.

But on the cache maintenance emulation part, is KVM switching the TTBR
to the host OS when emulating the operations? If yes, the original
virtual address is no longer present so you need to create the same
alias before flushing (need to look at the KVM patches at some point).

> > As long as you use copy_from_user which gets the same user virtual
> > address, there is no need for any cache maintenance, you read it via the
> > same alias so you hit the same cache lines anyway.
> 
> I hope I explained this reasonably above. To clarify, the only time
> Qemu writes to guest memory (ignoring i/o) is before initial boot when
> it writes the bootloader and the kernel image to memory.

That's clear now.

Can you not force the ARMv6 to run in non-aliasing mode? I think there
is a bit in some CP15 register (depending on the implementation) but it
would limit the amount of cache to 16K (or 4K be way). Overall, it may
be cheaper than all the cache maintenance that you have to do.

> > In general, yes. But a guest OS may assume that the D-cache is disabled
> > (especially during booting) and not do any cache maintenance.
> >
> > There is another situation where a page is allocated by Qemu and zero'ed
> > by the kernel while the guest kernel tries to write it via a different
> > mapping created by KVM. It only flushes the latter while the former may
> > have some dirty cache lines being evicted (only if there is D-cache
> > aliasing on ARMv6).
> 
> I'm not sure what you mean here. Can you clarify a little?

It may not be clear enough to me how kvm works. So Qemu has a virtual
address space in the host OS. The guest OS has yet another virtual
address space inside the virtual space of Qemu.

The Qemu virtual space is allocated by the host kernel. The anonymous
pages are zero'ed or copied-on-write by the kernel before being mapped
into user space. But cache flushing takes place already (at least in
newer kernels), so that's not an issue.

When creating a virtual mapping, does KVM pin the Qemu pages in memory
using something like get_user_pages or does KVM allocates the pages
itself?

Catalin