[PATCH v5 0/2] MTE support for KVM guest

Mon Dec 7 11:34:05 EST 2020

On Mon, Dec 07, 2020 at 04:05:55PM +0000, Marc Zyngier wrote:
> On 2020-12-07 15:45, Steven Price wrote:
> > On 07/12/2020 15:27, Peter Maydell wrote:
> > > On Mon, 7 Dec 2020 at 14:48, Steven Price <steven.price at arm.com>
> > > wrote:
> > > > Sounds like you are making good progress - thanks for the
> > > > update. Have
> > > > you thought about how the PROT_MTE mappings might work if QEMU itself
> > > > were to use MTE? My worry is that we end up with MTE in a guest
> > > > preventing QEMU from using MTE itself (because of the PROT_MTE
> > > > mappings). I'm hoping QEMU can wrap its use of guest memory in a
> > > > sequence which disables tag checking (something similar will be
> > > > needed
> > > > for the "protected VM" use case anyway), but this isn't
> > > > something I've
> > > > looked into.
> > > 
> > > It's not entirely the same as the "protected VM" case. For that
> > > the patches currently on list basically special case "this is a
> > > debug access (eg from gdbstub/monitor)" which then either gets
> > > to go via "decrypt guest RAM for debug" or gets failed depending
> > > on whether the VM has a debug-is-ok flag enabled. For an MTE
> > > guest the common case will be guests doing standard DMA operations
> > > to or from guest memory. The ideal API for that from QEMU's
> > > point of view would be "accesses to guest RAM don't do tag
> > > checks, even if tag checks are enabled for accesses QEMU does to
> > > memory it has allocated itself as a normal userspace program".
> > 
> > Sorry, I know I simplified it rather by saying it's similar to
> > protected VM. Basically as I see it there are three types of memory
> > access:
> > 
> > 1) Debug case - has to go via a special case for decryption or
> > ignoring the MTE tag value. Hopefully this can be abstracted in the
> > same way.
> > 
> > 2) Migration - for a protected VM there's likely to be a special
> > method to allow the VMM access to the encrypted memory (AFAIK memory
> > is usually kept inaccessible to the VMM). For MTE this again has to be
> > special cased as we actually want both the data and the tag values.
> > 
> > 3) Device DMA - for a protected VM it's usual to unencrypt a small
> > area of memory (with the permission of the guest) and use that as a
> > bounce buffer. This is possible with MTE: have an area the VMM
> > purposefully maps with PROT_MTE. The issue is that this has a
> > performance overhead and we can do better with MTE because it's
> > trivial for the VMM to disable the protection for any memory.
> > 
> > The part I'm unsure on is how easy it is for QEMU to deal with (3)
> > without the overhead of bounce buffers. Ideally there'd already be a
> > wrapper for guest memory accesses and that could just be wrapped with
> > setting TCO during the access. I suspect the actual situation is more
> > complex though, and I'm hoping Haibo's investigations will help us
> > understand this.
> 
> What I'd really like to see is a description of how shared memory
> is, in general, supposed to work with MTE. My gut feeling is that
> it doesn't, and that you need to turn MTE off when sharing memory
> (either implicitly or explicitly).

The allocation tag (in-memory tag) is a property assigned to a physical
address range and it can be safely shared between different processes as
long as they access it via pointers with the same allocation tag (bits
59:56). The kernel enables such tagged shared memory for user processes
(anonymous, tmpfs, shmem).

What we don't have in the architecture is a memory type which allows
access to tags but no tag checking. To access the data when the tags
aren't known, the tag checking would have to be disabled via either a
prctl() or by setting the PSTATE.TCO bit.

The kernel accesses the user memory via the linear map using a match-all
tag 0xf, so no TCO bit toggling. For user, however, we disabled such
match-all tag and it cannot be enabled at run-time (at least not easily,
it's cached in the TLB). However, we already have two modes to disable
tag checking which Qemu could use when migrating data+tags.

-- 
Catalin