[PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
Yosry Ahmed
yosry.ahmed at linux.dev
Thu Jan 8 10:01:36 PST 2026
On Thu, Jan 08, 2026 at 08:32:44AM -0800, Sean Christopherson wrote:
> On Fri, Jan 02, 2026, Yosry Ahmed wrote:
> > On Tue, Dec 30, 2025 at 03:01:50PM -0800, Sean Christopherson wrote:
> > > WRITE_ONCE(*b, 1);
> > > - GUEST_SYNC(true);
> > > + GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
> > > WRITE_ONCE(*b, 1);
> > > - GUEST_SYNC(true);
> > > - GUEST_SYNC(false);
> > > + GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
> > > + READ_ONCE(*b);
> > > + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> > > + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> >
> > Instead of hardcoding 0 and 2 here, which IIUC correspond to the
> > physical addresses 0xc0000000 and 0xc0002000, as well as indices in
> > host_test_mem, can we make the overall definitions a bit more intuitive?
> >
> > For example:
> >
> > #define GUEST_GPA_START 0xc0000000
> > #define GUEST_PAGE1_IDX 0
> > #define GUEST_PAGE2_IDX 1
> > #define GUEST_GPA_PAGE1 (GUEST_GPA_START + GUEST_PAGE1_IDX * PAGE_SIZE)
> > #define GUEST_GPA_PAGE2 (GUEST_GPA_START + GUEST_PAGE2_IDX * PAGE_SIZE)
> >
> > /* Mapped to GUEST_GPA_PAGE1 and GUEST_GPA_PAGE2 */
> > #define GUEST_GVA_PAGE1 0xd0000000
> > #define GUEST_GVA_PAGE2 0xd0002000
> >
> > /* Mapped to GUEST_GPA_PAGE1 and GUEST_GPA_PAGE2 using TDP in L1 */
> > #define GUEST_GVA_NESTED_PAGE1 0xd0001000
> > #define GUEST_GVA_NESTED_PAGE2 0xd0003000
> >
> > Then in L2 code, we can explicitly take in the GVA of page1 and page2
> > and use the definitions above in the GUEST_SYNC() calls, for example:
> >
> > static void l2_guest_code(u64 *page1_gva, u64 *page2_gva)
> > {
> > READ_ONCE(*page1_gva);
> > GUEST_SYNC(GUEST_PAGE1_IDX | TEST_SYNC_READ_FAULT);
> > WRITE_ONCE(*page1_gva, 1);
> > GUEST_SYNC(GUEST_PAGE1_IDX | TEST_SYNC_WRITE_FAULT);
> > ...
> > }
> >
> > and we can explicitly read page1 and page2 from the host (instead of
> > using host_test_mem).
> >
> > Alternatively, we can pass in the guest GVA directly into GUEST_SYNC(),
> > and use the lower bits for TEST_SYNC_READ_FAULT, TEST_SYNC_WRITE_FAULT,
> > and TEST_SYNC_NO_FAULT.
> >
> > WDYT?
>
> I fiddled with this a bunch and came up with the below. It's more or less what
> you're suggesting, but instead of interleaving the aliases, it simply puts them
> at a higher base. That makes pulling the page frame number out of the GVA much
> cleaner, as it's simply arithmetic instead of weird masking and shifting magic.
>
> --
> From: Sean Christopherson <seanjc at google.com>
> Date: Wed, 7 Jan 2026 14:38:32 -0800
> Subject: [PATCH] KVM: selftests: Test READ=>WRITE dirty logging behavior for
> shadow MMU
>
> Update the nested dirty log test to validate KVM's handling of READ faults
> when dirty logging is enabled. Specifically, set the Dirty bit in the
> guest PTEs used to map L2 GPAs, so that KVM will create writable SPTEs
> when handling L2 read faults. When handling read faults in the shadow MMU,
> KVM opportunistically creates a writable SPTE if the mapping can be
> writable *and* the gPTE is dirty (or doesn't support the Dirty bit), i.e.
> if KVM doesn't need to intercept writes in order to emulate Dirty-bit
> updates.
>
> To actually test the L2 READ=>WRITE sequence, e.g. without masking a false
> pass by other test activity, route the READ=>WRITE and WRITE=>WRITE
> sequences to separate L1 pages, and differentiate between "marked dirty
> due to a WRITE access/fault" and "marked dirty due to creating a writable
> SPTE for a READ access/fault". The updated sequence exposes the bug fixed
> by KVM commit 1f4e5fc83a42 ("KVM: x86: fix nested guest live migration
> with PML") when the guest performs a READ=>WRITE sequence with dirty guest
> PTEs.
>
> Opportunistically tweak and rename the address macros, and add comments,
> to make it more obvious what the test is doing. E.g. NESTED_TEST_MEM1
> vs. GUEST_TEST_MEM doesn't make it all that obvious that the test is
> creating aliases in both the L2 GPA and GVA address spaces, but only when
> L1 is using TDP to run L2.
>
> Signed-off-by: Sean Christopherson <seanjc at google.com>
> ---
> .../selftests/kvm/include/x86/processor.h | 1 +
> .../testing/selftests/kvm/lib/x86/processor.c | 7 +
> .../selftests/kvm/x86/nested_dirty_log_test.c | 188 +++++++++++++-----
> 3 files changed, 145 insertions(+), 51 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
> index ab29b1c7ed2d..8945c9eea704 100644
> --- a/tools/testing/selftests/kvm/include/x86/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> @@ -1483,6 +1483,7 @@ bool kvm_cpu_has_tdp(void);
> void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
> void tdp_identity_map_default_memslots(struct kvm_vm *vm);
> void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
> +uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa);
>
> /*
> * Basic CPU control in CR0
> diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> index ab869a98bbdc..fab18e9be66c 100644
> --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> @@ -390,6 +390,13 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
> return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
> }
>
> +uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa)
nested_paddr is the name used by tdp_map(), maybe use that here as well
(and in the header)?
> +{
> + int level = PG_LEVEL_4K;
> +
> + return __vm_get_page_table_entry(vm, &vm->stage2_mmu, l2_gpa, &level);
> +}
> +
> uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr)
> {
> int level = PG_LEVEL_4K;
[..]
> @@ -133,35 +220,50 @@ static void test_dirty_log(bool nested_tdp)
>
> /* Add an extra memory slot for testing dirty logging */
> vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
> - GUEST_TEST_MEM,
> + TEST_MEM_BASE,
> TEST_MEM_SLOT_INDEX,
> TEST_MEM_PAGES,
> KVM_MEM_LOG_DIRTY_PAGES);
>
> /*
> - * Add an identity map for GVA range [0xc0000000, 0xc0002000). This
> + * Add an identity map for GVA range [0xc0000000, 0xc0004000). This
> * affects both L1 and L2. However...
> */
> - virt_map(vm, GUEST_TEST_MEM, GUEST_TEST_MEM, TEST_MEM_PAGES);
> + virt_map(vm, TEST_MEM_BASE, TEST_MEM_BASE, TEST_MEM_PAGES);
>
> /*
> - * ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
> - * 0xc0000000.
> + * ... pages in the L2 GPA ranges [0xc0001000, 0xc0002000) and
> + * [0xc0003000, 0xc0004000) will map to 0xc0000000 and 0xc0001000
> + * respectively.
Are these ranges correct? I thought L2 GPA range [0xc0002000,
0xc0004000) will map to [0xc0000000, 0xc0002000).
Also, perhaps it's better to express those in terms of the macros?
L2 GPA range [TEST_MEM_ALIAS_BASE, TEST_MEM_ALIAS_BASE + 2*PAGE_SIZE)
will map to [TEST_MEM_BASE, TEST_MEM_BASE + 2*PAGE_SIZE)?
> *
> * When TDP is disabled, the L2 guest code will still access the same L1
> * GPAs as the TDP enabled case.
> + *
> + * Set the Dirty bit in the PTEs used by L2 so that KVM will create
> + * writable SPTEs when handling read faults (if the Dirty bit isn't
> + * set, KVM must intercept the next write to emulate the Dirty bit
> + * update).
> */
> if (nested_tdp) {
> + vm_vaddr_t gva0 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 0);
> + vm_vaddr_t gva1 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 1);
Why are these gvas? Should these be L2 GPAs?
Maybe 'uint64_t l2_gpa0' or 'uint64_t nested_paddr0'?
Also maybe add TEST_ALIAS_GPA() macro to keep things consistent?
> +
> tdp_identity_map_default_memslots(vm);
> - tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
> - tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
> + tdp_map(vm, gva0, TEST_GPA(0), PAGE_SIZE);
> + tdp_map(vm, gva1, TEST_GPA(1), PAGE_SIZE);
> +
> + *tdp_get_pte(vm, gva0) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
> + *tdp_get_pte(vm, gva1) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
> + } else {
> + *vm_get_pte(vm, TEST_GVA(0)) |= PTE_DIRTY_MASK(&vm->mmu);
> + *vm_get_pte(vm, TEST_GVA(1)) |= PTE_DIRTY_MASK(&vm->mmu);
> }
>
> bmap = bitmap_zalloc(TEST_MEM_PAGES);
> - host_test_mem = addr_gpa2hva(vm, GUEST_TEST_MEM);
>
> while (!done) {
> - memset(host_test_mem, 0xaa, TEST_MEM_PAGES * PAGE_SIZE);
> + memset(TEST_HVA(vm, 0), 0xaa, TEST_MEM_PAGES * PAGE_SIZE);
> +
> vcpu_run(vcpu);
> TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
>
[..]
More information about the linux-riscv
mailing list