[PATCH 5/5] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
Chao Gao
chao.gao at intel.com
Tue Aug 5 23:06:31 PDT 2025
>Oof. And as Chao pointed out[*], removing the vm_dead check would allow creating
>and running vCPUs in a dead VM, which is most definitely not desirable. Squashing
>the vCPU creation case is easy enough if we keep vm_dead but still generally allow
>ioctls, and it's probably worth doing that no matter what (to plug the hole where
>pending vCPU creations could succeed):
>
>diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>index d477a7fda0ae..941d2c32b7dc 100644
>--- a/virt/kvm/kvm_main.c
>+++ b/virt/kvm/kvm_main.c
>@@ -4207,6 +4207,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>
> mutex_lock(&kvm->lock);
>
>+ if (kvm->vm_dead) {
>+ r = -EIO;
>+ goto unlock_vcpu_destroy;
>+ }
>+
yes. this addresses my concern.
> if (kvm_get_vcpu_by_id(kvm, id)) {
> r = -EEXIST;
> goto unlock_vcpu_destroy;
>
>And then to ensure vCPUs can't do anything, check KVM_REQ_VM_DEAD after acquiring
>vcpu->mutex.
>
>diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>index 6c07dd423458..883077eee4ce 100644
>--- a/virt/kvm/kvm_main.c
>+++ b/virt/kvm/kvm_main.c
>@@ -4433,6 +4433,12 @@ static long kvm_vcpu_ioctl(struct file *filp,
>
> if (mutex_lock_killable(&vcpu->mutex))
> return -EINTR;
>+
>+ if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu)) {
>+ r = -EIO;
>+ goto out;
>+ }
>+
> switch (ioctl) {
> case KVM_RUN: {
> struct pid *oldpid;
>
>
>That should address all TDVPS paths (I hope), and I _think_ would address all
>MMU-related paths as well? E.g. prefault requires a vCPU.
>
>Disallowing (most) vCPU ioctls but not all VM ioctls on vm_dead isn't great ABI
>(understatement), but I think we need/want the above changes even if we keep the
>general vm_dead restriction. And given the extremely ad hoc behavior of taking
>kvm->lock for VM ioctls, trying to enforce vm_dead for "all" VM ioctls seems like
>a fool's errand.
>
>So I'm leaning toward keeping "KVM: Reject ioctls only if the VM is bugged, not
>simply marked dead" (with a different shortlog+changelog), but keeping vm_dead
>(and not introducing kvm_tdx.vm_terminated).
Sounds good to me.
With kvm_tdx.vm_terminated removed, we should consider adding a comment above
the is_hkid_assigned() check in tdx_sept_remove_private_spte() to clarify that
!is_hkid_assigned() indicates the guest has been terminated, allowing private
pages to be reclaimed directly without zapping.
More information about the linux-arm-kernel
mailing list