[RFC PATCH v6 32/35] KVM: Add KVM_EXIT_RLIMIT exit_reason
Alexandru Elisei
alexandru.elisei at arm.com
Fri Nov 14 08:07:13 PST 2025
Arm CPUs can optionally implement a feature called Statistical Profiling
Extension (SPE). When this feature is in use, a record is created at
certain intervals, with information about the operation that the CPU was
executing when the record was created. This record is then written to a
buffer in memory.
The buffer where records are written is defined by virtual addresses (a
base and a limit). The translation from a buffer virtual address to a
physical address is performed using the CPU's translation tables. If the
Statistical Profiling Unit (SPU) encounters a fault on the CPU's stage 2
during the translation process, profiling stops and the fault is reported
to the CPU asynchronously, via an **interrupt**, not a (synchronous)
exception, like with CPU MMU faults.
The interrupt is delivered to the CPU asynchronously, and operations
executed by the CPU after the SPU asserts the interrupt and before the
CPU receives the interrupt are not sampled by the SPU. This leads to
different sampling profiles between baremetal and a virtual machine when
the same code is executed.
The solution is to pre-fault the memory representing the buffer and pin
it in the host so it doesn't get unmapped from stage 2. Furthermore, the
host memory representing the guest's translation tables for the buffer
virtual addresses must also be pinned, to avoid faults on translation
table walks. The stage 1 tables that map the buffer are programmed by
the guest, and this makes it impossible for KVM to know beforehand how
many levels and how many pages it needs to pin; KVM has this information
only after walking the guest's stage 1 tables, when the running guest
enables the buffer.
Memory pinned by KVM for the buffer must be subject to the RLIMIT_MEMLOCK
limit. Add a new KVM_RUN exit code for KVM to let userspace know when the
limit has been exceeded, and by how much. Userspace can then decide if they
want to (or can) further increase the limit.
Signed-off-by: Alexandru Elisei <alexandru.elisei at arm.com>
---
Documentation/virt/kvm/api.rst | 13 +++++++++++++
include/uapi/linux/kvm.h | 6 ++++++
2 files changed, 19 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 10e0733297ac..2276c4590948 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7318,6 +7318,19 @@ Please note that the kernel is allowed to use the kvm_run structure as the
primary storage for certain register types. Therefore, the kernel may use the
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
+::
+
+ /* KVM_EXIT_RLIMIT */
+ struct {
+ __u64 excess;
+ __u8 rlimit_id;
+ } rlimit;
+
+If the exit_reason is KVM_EXIT_RLIMIT, the VCPU has exceeded a system resource
+limit. The 'rlimit_id' is set to the resource limit ID (see man 2 getrlimit),
+and the 'excess' field is set to the amount by which the limit was exceeded.
+The unit of measurement is the unit of measurement associated with the resource
+limit.
.. _cap_enable:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 11e5dbde331b..f27679266197 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -179,6 +179,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_LOONGARCH_IOCSR 38
#define KVM_EXIT_MEMORY_FAULT 39
#define KVM_EXIT_TDX 40
+#define KVM_EXIT_RLIMIT 41
/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -473,6 +474,11 @@ struct kvm_run {
} setup_event_notify;
};
} tdx;
+ /* KVM_EXIT_RLIMIT */
+ struct {
+ __u64 excess;
+ __u8 rlimit_id;
+ } rlimit;
/* Fix the size of the union. */
char padding[256];
};
--
2.51.2
More information about the linux-arm-kernel
mailing list