[PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported
Aneesh Kumar K.V
aneesh.kumar at kernel.org
Tue Jan 28 02:31:18 PST 2025
Catalin Marinas <catalin.marinas at arm.com> writes:
> On Mon, Jan 13, 2025 at 12:47:54PM -0800, Peter Collingbourne wrote:
>> On Mon, Jan 13, 2025 at 11:09 AM Catalin Marinas
>> <catalin.marinas at arm.com> wrote:
>> > On Sat, Jan 11, 2025 at 06:49:55PM +0530, Aneesh Kumar K.V wrote:
>> > > Catalin Marinas <catalin.marinas at arm.com> writes:
>> > > > On Fri, Jan 10, 2025 at 04:30:21PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> > > >> Currently, the kernel won't start a guest if the MTE feature is enabled
>> > >
>> > > ...
>> > >
>> > > >> @@ -2152,7 +2162,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>> > > >> if (!vma)
>> > > >> break;
>> > > >>
>> > > >> - if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
>> > > >> + if (kvm_has_mte(kvm) &&
>> > > >> + !kvm_has_mte_perm(kvm) && !kvm_vma_mte_allowed(vma)) {
>> > > >> ret = -EINVAL;
>> > > >> break;
>> > > >> }
>> > > >
>> > > > I don't think we should change this, or at least not how it's done above
>> > > > (Suzuki raised a related issue internally relaxing this for VM_PFNMAP).
>> > > >
>> > > > For standard memory slots, we want to reject them upfront rather than
>> > > > deferring to the fault handler. An example here is file mmap() passed as
>> > > > standard RAM to the VM. It's an unnecessary change in behaviour IMHO.
>> > > > I'd only relax this for VM_PFNMAP mappings further down in this
>> > > > function (and move the VM_PFNMAP check above; see Suzuki's internal
>> > > > patch, unless he posted it publicly already).
>> > >
>> > > But we want to handle memslots backed by pagecache pages for virtio-shm
>> > > here (virtiofs dax use case).
>> >
>> > Ah, I forgot about this use case. So with virtiofs DAX, does a host page
>> > cache page (host VMM mmap()) get mapped directly into the guest as a
>> > separate memory slot? In this case, the host vma would not have
>> > VM_MTE_ALLOWED set.
>> >
>> > > With MTE_PERM, we can essentially skip the
>> > > kvm_vma_mte_allowed(vma) check because we handle all types in the fault
>> > > handler.
>> >
>> > This was pretty much the early behaviour when we added KVM support for
>> > MTE, allow !VM_MTE_ALLOWED and trap them later. However, we disallowed
>> > VM_SHARED because of some non-trivial race. Commit d89585fbb308 ("KVM:
>> > arm64: unify the tests for VMAs in memslots when MTE is enabled")
>> > changed this behaviour and the VM_MTE_ALLOWED check happens upfront. A
>> > subsequent commit removed the VM_SHARED check.
>> >
>> > It's a minor ABI change but I'm trying to figure out why we needed this
>> > upfront check rather than simply dropping the VM_SHARED check. Adding
>> > Peter in case he remembers. I can't see any race if we simply skipped
>> > this check altogether, irrespective of FEAT_MTE_PERM.
>>
>> I don't see a problem with removing the upfront check. The reason I
>> kept the check was IIRC just that there was already a check there and
>> its logic needed to be adjusted for my VM_SHARED changes.
>
> Prior to commit d89585fbb308, kvm_arch_prepare_memory_region() only
> rejected a memory slot if VM_SHARED was set. This commit unified the
> checking with user_mem_abort(), with slots being rejected if
> (!VM_MTE_ALLOWED || VM_SHARED). A subsequent commit dropped the
> VM_SHARED check, so we ended up with memory slots being rejected only if
> !VM_MTE_ALLOWED (of course, if kvm_has_mte()). This wasn't the case
> before the VM_SHARED relaxation.
>
> So if you don't remember any strong reason for this change, I think we
> should go back to the original behaviour of deferring the VM_MTE_ALLOWED
> check to user_mem_abort() (and still permitting VM_SHARED).
>
Something as below?
>From 466237a6f0a165152c157ab4a73f34c400cffe34 Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V (Arm)" <aneesh.kumar at kernel.org>
Date: Tue, 28 Jan 2025 14:21:52 +0530
Subject: [PATCH] KVM: arm64: Drop mte_allowed check during memslot creation
Before commit d89585fbb308 ("KVM: arm64: unify the tests for VMAs in
memslots when MTE is enabled"), kvm_arch_prepare_memory_region() only
rejected a memory slot if VM_SHARED was set. This commit unified the
checking with user_mem_abort(), with slots being rejected if either
VM_MTE_ALLOWED is not set or VM_SHARED set. A subsequent commit
c911f0d46879 ("KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE
enabled") dropped the VM_SHARED check, so we ended up with memory slots
being rejected if VM_MTE_ALLOWED is not set. This wasn't the case before
the commit d89585fbb308. The rejection of the memory slot with VM_SHARED
set was done to avoid a race condition with the test/set of the
PG_mte_tagged flag. Before Commit d77e59a8fccd ("arm64: mte: Lock a page
for MTE tag initialization") the kernel avoided allowing MTE with shared
pages, thereby preventing two tasks sharing a page from setting up the
PG_mte_tagged flag racily.
Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag
initialization") further updated the locking so that the kernel
allows VM_SHARED mapping with MTE. With this commit, we can enable
memslot creation with VM_SHARED VMA mapping.
This patch results in a minor tweak to the ABI. We now allow creating
memslots that don't have the VM_MTE_ALLOWED flag set. If the guest uses
such a memslot with Allocation Tags, the kernel will generate -EFAULT.
ie, instead of failing early, we now fail later during KVM_RUN.
This change is needed because, without it, users are not able to use MTE
with VFIO passthrough, as shown below (kvmtool VMM).
[ 617.921030] vfio-pci 0000:01:00.0: resetting
[ 618.024719] vfio-pci 0000:01:00.0: reset done
Error: 0000:01:00.0: failed to register region with KVM
Warning: [0abc:aced] Error activating emulation for BAR 0
Error: 0000:01:00.0: failed to configure regions
Warning: Failed init: vfio__init
Fatal: Initialisation failed
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar at kernel.org>
---
arch/arm64/kvm/mmu.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 007dda958eab..610becd8574e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2146,11 +2146,6 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
if (!vma)
break;
- if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
- ret = -EINVAL;
- break;
- }
-
if (vma->vm_flags & VM_PFNMAP) {
/* IO region dirty page logging not allowed */
if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
--
2.43.0
More information about the linux-arm-kernel
mailing list