[PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU

Yosry Ahmed yosry.ahmed at linux.dev
Thu Jan 8 10:01:36 PST 2026


On Thu, Jan 08, 2026 at 08:32:44AM -0800, Sean Christopherson wrote:
> On Fri, Jan 02, 2026, Yosry Ahmed wrote:
> > On Tue, Dec 30, 2025 at 03:01:50PM -0800, Sean Christopherson wrote:
> > >  	WRITE_ONCE(*b, 1);
> > > -	GUEST_SYNC(true);
> > > +	GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
> > >  	WRITE_ONCE(*b, 1);
> > > -	GUEST_SYNC(true);
> > > -	GUEST_SYNC(false);
> > > +	GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
> > > +	READ_ONCE(*b);
> > > +	GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> > > +	GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> > 
> > Instead of hardcoding 0 and 2 here, which IIUC correspond to the
> > physical addresses 0xc0000000 and 0xc0002000, as well as indices in
> > host_test_mem, can we make the overall definitions a bit more intuitive?
> > 
> > For example:
> > 
> > #define GUEST_GPA_START		0xc0000000
> > #define GUEST_PAGE1_IDX		0
> > #define GUEST_PAGE2_IDX		1
> > #define GUEST_GPA_PAGE1		(GUEST_GPA_START + GUEST_PAGE1_IDX * PAGE_SIZE)
> > #define GUEST_GPA_PAGE2		(GUEST_GPA_START + GUEST_PAGE2_IDX * PAGE_SIZE)
> > 
> > /* Mapped to GUEST_GPA_PAGE1 and GUEST_GPA_PAGE2 */
> > #define GUEST_GVA_PAGE1		0xd0000000
> > #define GUEST_GVA_PAGE2		0xd0002000
> > 
> > /* Mapped to GUEST_GPA_PAGE1 and GUEST_GPA_PAGE2 using TDP in L1 */
> > #define GUEST_GVA_NESTED_PAGE1  0xd0001000
> > #define GUEST_GVA_NESTED_PAGE2	0xd0003000
> > 
> > Then in L2 code, we can explicitly take in the GVA of page1 and page2
> > and use the definitions above in the GUEST_SYNC() calls, for example:
> > 
> > static void l2_guest_code(u64 *page1_gva, u64 *page2_gva)
> > {
> >         READ_ONCE(*page1_gva);
> >         GUEST_SYNC(GUEST_PAGE1_IDX | TEST_SYNC_READ_FAULT);
> >         WRITE_ONCE(*page1_gva, 1);
> >         GUEST_SYNC(GUEST_PAGE1_IDX | TEST_SYNC_WRITE_FAULT);
> > 	...
> > }
> > 
> > and we can explicitly read page1 and page2 from the host (instead of
> > using host_test_mem).
> > 
> > Alternatively, we can pass in the guest GVA directly into GUEST_SYNC(),
> > and use the lower bits for TEST_SYNC_READ_FAULT, TEST_SYNC_WRITE_FAULT,
> > and TEST_SYNC_NO_FAULT.
> >
> > WDYT?
> 
> I fiddled with this a bunch and came up with the below.  It's more or less what
> you're suggesting, but instead of interleaving the aliases, it simply puts them
> at a higher base.  That makes pulling the page frame number out of the GVA much
> cleaner, as it's simply arithmetic instead of weird masking and shifting magic.
> 
> --
> From: Sean Christopherson <seanjc at google.com>
> Date: Wed, 7 Jan 2026 14:38:32 -0800
> Subject: [PATCH] KVM: selftests: Test READ=>WRITE dirty logging behavior for
>  shadow MMU
> 
> Update the nested dirty log test to validate KVM's handling of READ faults
> when dirty logging is enabled.  Specifically, set the Dirty bit in the
> guest PTEs used to map L2 GPAs, so that KVM will create writable SPTEs
> when handling L2 read faults.  When handling read faults in the shadow MMU,
> KVM opportunistically creates a writable SPTE if the mapping can be
> writable *and* the gPTE is dirty (or doesn't support the Dirty bit), i.e.
> if KVM doesn't need to intercept writes in order to emulate Dirty-bit
> updates.
> 
> To actually test the L2 READ=>WRITE sequence, e.g. without masking a false
> pass by other test activity, route the READ=>WRITE and WRITE=>WRITE
> sequences to separate L1 pages, and differentiate between "marked dirty
> due to a WRITE access/fault" and "marked dirty due to creating a writable
> SPTE for a READ access/fault".  The updated sequence exposes the bug fixed
> by KVM commit 1f4e5fc83a42 ("KVM: x86: fix nested guest live migration
> with PML") when the guest performs a READ=>WRITE sequence with dirty guest
> PTEs.
> 
> Opportunistically tweak and rename the address macros, and add comments,
> to make it more obvious what the test is doing.  E.g. NESTED_TEST_MEM1
> vs. GUEST_TEST_MEM doesn't make it all that obvious that the test is
> creating aliases in both the L2 GPA and GVA address spaces, but only when
> L1 is using TDP to run L2.
> 
> Signed-off-by: Sean Christopherson <seanjc at google.com>
> ---
>  .../selftests/kvm/include/x86/processor.h     |   1 +
>  .../testing/selftests/kvm/lib/x86/processor.c |   7 +
>  .../selftests/kvm/x86/nested_dirty_log_test.c | 188 +++++++++++++-----
>  3 files changed, 145 insertions(+), 51 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
> index ab29b1c7ed2d..8945c9eea704 100644
> --- a/tools/testing/selftests/kvm/include/x86/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> @@ -1483,6 +1483,7 @@ bool kvm_cpu_has_tdp(void);
>  void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
>  void tdp_identity_map_default_memslots(struct kvm_vm *vm);
>  void tdp_identity_map_1g(struct kvm_vm *vm,  uint64_t addr, uint64_t size);
> +uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa);
>  
>  /*
>   * Basic CPU control in CR0
> diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> index ab869a98bbdc..fab18e9be66c 100644
> --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> @@ -390,6 +390,13 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
>  	return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
>  }
>  
> +uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa)

nested_paddr is the name used by tdp_map(), maybe use that here as well
(and in the header)?

> +{
> +	int level = PG_LEVEL_4K;
> +
> +	return __vm_get_page_table_entry(vm, &vm->stage2_mmu, l2_gpa, &level);
> +}
> +
>  uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr)
>  {
>  	int level = PG_LEVEL_4K;
[..]
> @@ -133,35 +220,50 @@ static void test_dirty_log(bool nested_tdp)
>  
>  	/* Add an extra memory slot for testing dirty logging */
>  	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
> -				    GUEST_TEST_MEM,
> +				    TEST_MEM_BASE,
>  				    TEST_MEM_SLOT_INDEX,
>  				    TEST_MEM_PAGES,
>  				    KVM_MEM_LOG_DIRTY_PAGES);
>  
>  	/*
> -	 * Add an identity map for GVA range [0xc0000000, 0xc0002000).  This
> +	 * Add an identity map for GVA range [0xc0000000, 0xc0004000).  This
>  	 * affects both L1 and L2.  However...
>  	 */
> -	virt_map(vm, GUEST_TEST_MEM, GUEST_TEST_MEM, TEST_MEM_PAGES);
> +	virt_map(vm, TEST_MEM_BASE, TEST_MEM_BASE, TEST_MEM_PAGES);
>  
>  	/*
> -	 * ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
> -	 * 0xc0000000.
> +	 * ... pages in the L2 GPA ranges [0xc0001000, 0xc0002000) and
> +	 * [0xc0003000, 0xc0004000) will map to 0xc0000000 and 0xc0001000
> +	 * respectively.

Are these ranges correct? I thought L2 GPA range [0xc0002000,
0xc0004000) will map to [0xc0000000, 0xc0002000).

Also, perhaps it's better to express those in terms of the macros?

L2 GPA range [TEST_MEM_ALIAS_BASE, TEST_MEM_ALIAS_BASE + 2*PAGE_SIZE)
will map to [TEST_MEM_BASE, TEST_MEM_BASE + 2*PAGE_SIZE)?

>  	 *
>  	 * When TDP is disabled, the L2 guest code will still access the same L1
>  	 * GPAs as the TDP enabled case.
> +	 *
> +	 * Set the Dirty bit in the PTEs used by L2 so that KVM will create
> +	 * writable SPTEs when handling read faults (if the Dirty bit isn't
> +	 * set, KVM must intercept the next write to emulate the Dirty bit
> +	 * update).
>  	 */
>  	if (nested_tdp) {
> +		vm_vaddr_t gva0 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 0);
> +		vm_vaddr_t gva1 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 1);

Why are these gvas? Should these be L2 GPAs?

Maybe 'uint64_t l2_gpa0' or 'uint64_t nested_paddr0'?

Also maybe add TEST_ALIAS_GPA() macro to keep things consistent?

> +
>  		tdp_identity_map_default_memslots(vm);
> -		tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
> -		tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
> +		tdp_map(vm, gva0, TEST_GPA(0), PAGE_SIZE);
> +		tdp_map(vm, gva1, TEST_GPA(1), PAGE_SIZE);
> +
> +		*tdp_get_pte(vm, gva0) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
> +		*tdp_get_pte(vm, gva1) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
> +	} else {
> +		*vm_get_pte(vm, TEST_GVA(0)) |= PTE_DIRTY_MASK(&vm->mmu);
> +		*vm_get_pte(vm, TEST_GVA(1)) |= PTE_DIRTY_MASK(&vm->mmu);
>  	}
>  
>  	bmap = bitmap_zalloc(TEST_MEM_PAGES);
> -	host_test_mem = addr_gpa2hva(vm, GUEST_TEST_MEM);
>  
>  	while (!done) {
> -		memset(host_test_mem, 0xaa, TEST_MEM_PAGES * PAGE_SIZE);
> +		memset(TEST_HVA(vm, 0), 0xaa, TEST_MEM_PAGES * PAGE_SIZE);
> +
>  		vcpu_run(vcpu);
>  		TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
>  
[..]



More information about the linux-riscv mailing list