[RFC PATCH v2 0/8] KVM: arm64: Implement SW/HW combined dirty log
Shameerali Kolothum Thodi
shameerali.kolothum.thodi at huawei.com
Thu Oct 12 00:51:19 PDT 2023
Hi,
> -----Original Message-----
> From: linux-arm-kernel
> [mailto:linux-arm-kernel-bounces at lists.infradead.org] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: 18 September 2023 10:55
> To: Oliver Upton <oliver.upton at linux.dev>
> Cc: kvmarm at lists.linux.dev; kvm at vger.kernel.org;
> linux-arm-kernel at lists.infradead.org; maz at kernel.org; will at kernel.org;
> catalin.marinas at arm.com; james.morse at arm.com;
> suzuki.poulose at arm.com; yuzenghui <yuzenghui at huawei.com>; zhukeqian
> <zhukeqian1 at huawei.com>; Jonathan Cameron
> <jonathan.cameron at huawei.com>; Linuxarm <linuxarm at huawei.com>
> Subject: RE: [RFC PATCH v2 0/8] KVM: arm64: Implement SW/HW combined
> dirty log
[...]
> > > Please let me know if there is a specific workload you have in mind.
> >
> > No objection to the workload you've chosen, I'm more concerned about
> the
> > benchmark finishing before live migration completes.
> >
> > What I'm looking for is something like this:
> >
> > - Calculate the ops/sec your benchmark completes in steady state
> >
> > - Do a live migration and sample the rate throughout the benchmark,
> > accounting for VM blackout time
> >
> > - Calculate the area under the curve of:
> >
> > y = steady_state_rate - live_migration_rate(t)
> >
> > - Compare the area under the curve for write-protection and your DBM
> > approach.
>
> Ok. Got it.
I attempted to benchmark the performance of this series better as suggested above.
Used memcached/memaslap instead of redis-benchmark as this tool seems to dirty
memory at a faster rate than redis-benchmark in my setup.
./memaslap -s 127.0.0.1:11211 -S 1s -F ./memslap.cnf -T 96 -c 96 -t 20m
Please find the google sheet link below for the charts that compare the average
throughput rates during the migration time window for 6.5-org and
6.5-kvm-dbm branch.
https://docs.google.com/spreadsheets/d/1T2F94Lsjpx080hW8OSxwbTJXihbXDNlTE1HjWCC0J_4/edit?usp=sharing
Sheet #1 : is with autoconverge=on with default settings(initial-throttle 20 & increment 10).
As you can see from the charts, if you compare the kvm-dbm branch throughput
during the migration window of original branch, it is considerably higher.
But the convergence time to finish migration increases almost at the same
rate for KVM-DBM. This in effect results in a decreased overall avg.
throughput if we compare with the same time window of original branch.
Sheet #2: is with autoconverge=on with throttle-increment set to 15 for kvm-dbm branch run.
However, if we increase the migration throttling rate for kvm-dbm branch,
it looks to me we can still have better throughput during the migration
window time and also an overall higher throughput rate with KVM-DBM solution.
Sheet: #3. Captures the dirty_log_perf_test times vs memory per vCPU.
This is also in line with the above results. KVM-DBM has better/constant-ish
dirty memory time compared to linear increase noted for original.
But it is just the opposite for Get Dirty log time.
>From the above, it looks to me there is a value addition in using HW DBM
for write intensive workloads if we adjust the CPU throttling in the user space.
Please take a look and let me know your feedback/thoughts.
Thanks,
Shameer
More information about the linux-arm-kernel
mailing list