[PATCH v2 2/3] KVM: arm64: Add two page mapping counters for kvm_stat

Tue Apr 20 14:52:07 BST 2021

On Tue, 20 Apr 2021 14:08:24 +0100,
Yoan Picchi <yoan.picchi at arm.com> wrote:
> 
> This patch adds regular_page_mapped and huge_page_mapped.
> regular_page_mapped is increased when a page of the smallest granularity
> is mapped. This is usually a 4k, 16k or 64k page.
> huge_page_mapped is increased when a huge page of any size other than the
> smallest granularity is mapped.
> Those counters only count pages allocated for the data and doesn't count
> the pages/blocks allocated to the page tables as I don't see where those
> might be needed to be recorded
> 
> I can see two usecases for those counters :
> 	We can detect memory pressure in the host when the guest gets
> regular pages instead of huge ones.
> 	May help detecting an abnormal memory usage like some recurring
> allocs past the kernel and a few program starts.
> With the previous patch about stage2_abort_exit, it have the added
> benefit of specifying the second main cause of stage 2 page fault (the
> other being mmio access)
> 
> To test this patch I did start a guest VM and monitor the page allocation.
> By default it only allocate huge pages. Then I tried to disable the huge
> pages with : echo never > /sys/kernel/mm/transparent_hugepage/enabled
> Starting the VM, it no longer allocate any huge page, but only regular
> pages.
> 
> I can't log into the guess because it doesn't recognize my keyboard.

Maybe you should consider having a look at what is going wrong
here. If your guest isn't working correctly, trying to account for
page allocation is a bit... irrelevant.

> To
> circumvent that I added some command to the init script that need some
> memory : cat /dev/zero | head -c 1000m | tail
> This take 1GiB of memory before finishing.
> From memory, it allocate 525 or so huge table which is around what I would
> expect with 2MB pages.
> 
> I did check the relation between stage 2 exits, mmio exits and
> allocation. The mmio + allocation account for almost all the stage 2 exit
> as expected. There was only about 20 exits that was neither a mmio or an
> alloc during the kernel boot. I did not look what they are, but it can be
> a memory permission relaxation, or resizing a page.
> 
> My main concern here is about the case where we replace a page/block by
> another/resize a block. I don't fully understand the mechanism yet and
> so don't know if it should be counted as an allocation or not. For now I
> don't account it.

None of this discussion belongs to a commit message.

> 
> Signed-off-by: Yoan Picchi <yoan.picchi at arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h | 2 ++
>  arch/arm64/kvm/guest.c            | 2 ++
>  arch/arm64/kvm/hyp/pgtable.c      | 5 +++++
>  3 files changed, 9 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 02891ce94..8f9d27571 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -547,6 +547,8 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
>  
>  struct kvm_vm_stat {
>  	ulong remote_tlb_flush;
> +	ulong regular_page_mapped;
> +	ulong huge_page_mapped;
>  };
>  
>  struct kvm_vcpu_stat {
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 82a4b6275..41316b30e 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -42,6 +42,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>  	VCPU_STAT("exits", exits),
>  	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
>  	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +	VM_STAT("regular_page_mapped", regular_page_mapped),
> +	VM_STAT("huge_page_mapped", huge_page_mapped),

As pointed out in the previous review, two sizes don't quite fit
all. There is a continuum of page, contiguous pages and block mappings
of various sizes (although KVM doesn't support the contiguous hint at
S2 yet).

For this information to be useful, you'd have to at least distinguish
between 2M and 1G mappings when operating with a 4k base granule
size. Because if I have enough memory to get contiguous 1G chunks, I
surely don't want them to show up as 2M mappings because I messed up
the userspace alignment requirements, and this counter can't
distinguish between these cases.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.