[PATCH v1 00/12] KVM Dirty-bit cleaning accelerator (HACDBS)

Thu Apr 30 07:51:20 PDT 2026

On Thu, 30 Apr 2026 14:29:37 +0100,
Leonardo Bras <leo.bras at arm.com> wrote:
> 
> On Thu, Apr 30, 2026 at 02:14:22PM +0100, Marc Zyngier wrote:
> > On Thu, 30 Apr 2026 12:14:04 +0100,
> > Leonardo Bras <leo.bras at arm.com> wrote:
> > 
> > > d - In __kvm_arch_dirty_log_clear() there is no way to predict how long
> > >     should be the buffer, so I used 1x PAGE_SIZE, and when it gets full
> > >     it's cleaned and reused. Should I let users configure that over a
> > >     parameter, or is it overthinking?
> > 
> > How long is a piece of string? We can't know that. A single page feels
> > very small in the 4kB case, and letting userspace define the size of
> > that buffer seems a likely requirement.
> > 
> 
> Ok, as a KVM parameter, or as a compile-time option?

Noticed the "userspace" word in there? It *has* to be controlled by
userspace one way or another. So not as a kernel parameter, and
*never* as a compile option.

> > > Kernel v7.0.0 + this patchset builds properly, passing both kvm selftests
> > > for dirty-bit tracking[2], on HW HACDBS enabled or disabled.
> > 
> > I have absolutely no trust in these tests.
> > 
> > Have you enabled a VMM to make use of these APIs, and actively
> > migrated running guests? That's the level of testing I'd like to see,
> > as the selftests are not what people run in production...
> > 
> 
> There is no enablement needed on VMM side.
> Yes, I have created a VM on upstream qemu with --enable-kvm and migrated it 
> on the same host. (Inside a model)
> 
> That was the first test I used, but then I found out that kvm selftests 
> stress up multiple scenarios in an easier way.

Except when they don't. In my experience, the selftests are only there
to give the CI people the fuzzy feeling that they are doing something
useful. I have a collection of examples indicating that what these
things test is not representative of the bugs we have in KVM.

> Do you prefer me to test on any specific scenario, or does whatever qemu
> uses as a default parameter work well enough?

I want to hear about testing at a scale that make sense for production
VMs, including live migrating between hosts while under memory
pressure (swapping out).

I'm also interested in efficiency: how much better is HACDBS compared
to the current page faulting? Just having patches for a feature is not
enough to decide adoption of that feature. Show me the benefits in a
quantitative way (within the limits of the model, of course).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.