[RFC PATCH v2 0/8] KVM: arm64: Implement SW/HW combined dirty log

Thu Oct 12 00:51:19 PDT 2023

Hi,

> -----Original Message-----
> From: linux-arm-kernel
> [mailto:linux-arm-kernel-bounces at lists.infradead.org] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: 18 September 2023 10:55
> To: Oliver Upton <oliver.upton at linux.dev>
> Cc: kvmarm at lists.linux.dev; kvm at vger.kernel.org;
> linux-arm-kernel at lists.infradead.org; maz at kernel.org; will at kernel.org;
> catalin.marinas at arm.com; james.morse at arm.com;
> suzuki.poulose at arm.com; yuzenghui <yuzenghui at huawei.com>; zhukeqian
> <zhukeqian1 at huawei.com>; Jonathan Cameron
> <jonathan.cameron at huawei.com>; Linuxarm <linuxarm at huawei.com>
> Subject: RE: [RFC PATCH v2 0/8] KVM: arm64: Implement SW/HW combined
> dirty log

[...]

> > > Please let me know if there is a specific workload you have in mind.
> >
> > No objection to the workload you've chosen, I'm more concerned about
> the
> > benchmark finishing before live migration completes.
> >
> > What I'm looking for is something like this:
> >
> >  - Calculate the ops/sec your benchmark completes in steady state
> >
> >  - Do a live migration and sample the rate throughout the benchmark,
> >    accounting for VM blackout time
> >
> >  - Calculate the area under the curve of:
> >
> >      y = steady_state_rate - live_migration_rate(t)
> >
> >  - Compare the area under the curve for write-protection and your DBM
> >    approach.
> 
> Ok. Got it.

I attempted to benchmark the performance of this series better as suggested above.

Used memcached/memaslap instead of redis-benchmark as this tool seems to dirty
memory at a faster rate than redis-benchmark in my setup.

./memaslap -s 127.0.0.1:11211 -S 1s  -F ./memslap.cnf -T 96 -c 96 -t 20m

Please find the google sheet link below for the charts that compare the average
throughput rates during the migration time window for 6.5-org and
6.5-kvm-dbm branch.

https://docs.google.com/spreadsheets/d/1T2F94Lsjpx080hW8OSxwbTJXihbXDNlTE1HjWCC0J_4/edit?usp=sharing

Sheet #1 : is with autoconverge=on with default settings(initial-throttle 20 & increment 10).

As you can see from the charts, if you compare the kvm-dbm branch throughput
during the migration window of original branch, it is considerably higher.
But the convergence time to finish migration increases almost at the same
rate for KVM-DBM. This in effect results in a decreased overall avg. 
throughput if we compare with the same time window of original branch.

Sheet #2: is with autoconverge=on with throttle-increment set to 15 for kvm-dbm branch run.

However, if we increase the migration throttling rate for kvm-dbm branch, 
it looks to me we can still have better throughput during the migration
window time and also an overall higher throughput rate with KVM-DBM solution.

Sheet: #3. Captures the dirty_log_perf_test times vs memory per vCPU. 

This is also in line with the above results. KVM-DBM has better/constant-ish
dirty memory time compared to linear increase noted for original. 
But it is just the opposite for Get Dirty log time. 

>From the above, it looks to me there is a value addition in using HW DBM
for write intensive workloads if we adjust the CPU throttling in the user space.

Please take a look and let me know your feedback/thoughts.

Thanks,
Shameer