[PATCH] KVM: arm64: Drop mte_allowed check during memslot creation

Wed Feb 26 08:48:21 PST 2025

Catalin Marinas <catalin.marinas at arm.com> writes:

> On Wed, Feb 26, 2025 at 03:28:26PM +0530, Aneesh Kumar K.V wrote:
>> Marc Zyngier <maz at kernel.org> writes:
>> > On Mon, 24 Feb 2025 16:44:06 +0000,
>> > Aneesh Kumar K.V <aneesh.kumar at kernel.org> wrote:
>> >> >> On Mon, Feb 24, 2025 at 12:24:14PM +0000, Marc Zyngier wrote:
>> >> >> > > On Mon, Feb 24, 2025 at 03:09:38PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> >> >> > > > This change is needed because, without it, users are not able to use MTE
>> >> >> > > > with VFIO passthrough (currently the mapping is either Device or
>> >> >> > > > NonCacheable for which tag access check is not applied.), as shown
>> >> >> > > > below (kvmtool VMM).
> [...]
>> >> >> > My other concern is that this gives pretty poor consistency to the
>> >> >> > guest, which cannot know what can be tagged and what cannot, and
>> >> >> > breaks a guarantee that the guest should be able to rely on.
> [...]
>> >> What if we trigger a memory fault exit with the TAGACCESS flag, allowing
>> >> the VMM to use the GPA to retrieve additional details and print extra
>> >> information to aid in analysis? BTW, we will do this on the first fault
>> >> in cacheable, non-tagged memory even if there is no tagaccess in that
>> >> region. This can be further improved using the NoTagAccess series I
>> >> posted earlier, which ensures the memory fault exit occurs only on
>> >> actual tag access
>> >> 
>> >> Something like below?
>> >
>> > Something like that, only with:
>> >
>> > - a capability informing userspace of this behaviour
>> >
>> > - a per-VM (or per-VMA) flag as a buy-in for that behaviour
>> 
>> If we’re looking for a capability based control, could we tie that up to
>> FEAT_MTE_PERM? That’s what I did here:
>> 
>> https://lore.kernel.org/all/20250110110023.2963795-1-aneesh.kumar@kernel.org
>> 
>> That patch set also addresses the issue mentioned here. Let me know if
>> you think this is a better approach
>
> From the patch linked above:
>
> | @@ -2152,7 +2162,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> |  		if (!vma)
> |  			break;
> | 
> | -		if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
> | +		if (kvm_has_mte(kvm) &&
> | +		    !kvm_has_mte_perm(kvm) && !kvm_vma_mte_allowed(vma)) {
> |  			ret = -EINVAL;
> |  			break;
> |  		}
>
> we also have the same ABI change every time FEAT_MTE_PERM is present.
> TBH, I'd rather have it from the start as per the patch in this thread,
> irrespective of FEAT_MTE_PERM. I'm fine, however, with better exit to
> VMM information though.
>

The patch also does:

#define kvm_has_mte_perm(kvm)					\
	(system_supports_notagaccess() &&				\
	 test_bit(KVM_ARCH_FLAG_MTE_PERM_ENABLED, &(kvm)->arch.flags))

That is it depends on userspace to drive the behavior and also relies on the
FEAT_MTE_PERM hardware feature. I was considering whether, if we're
introducing this capability, should we also include FEAT_MTE_PERM? since
adding FEAT_MTE_PERM would also require a capability to handle VM
migration

>
> If we don't want to confuse the VMMs, we could skip the
> kvm_vma_mte_allowed() check only for VM_ALLOW_ANY_UNCACHED and
> VM_PFNMAP vmas, maybe the latter only with FEAT_MTE_PERM. I don't think
> the VMM would get it wrong here since a VFIO mmap() would not support
> MTE anyway.

ok.

-aneesh