[RFC PATCH 0/6] Improve VM DVFS and task placement behavior

Marc Zyngier maz at kernel.org
Tue Apr 4 13:49:10 PDT 2023


On Tue, 04 Apr 2023 20:43:40 +0100,
Oliver Upton <oliver.upton at linux.dev> wrote:
> 
> Folks,
> 
> On Thu, Mar 30, 2023 at 03:43:35PM -0700, David Dai wrote:
> 
> <snip>
> 
> > PCMark
> > Higher is better
> > +-------------------+----------+------------+--------+-------+--------+
> > | Test Case (score) | Baseline |  Hypercall | %delta |  MMIO | %delta |
> > +-------------------+----------+------------+--------+-------+--------+
> > | Weighted Total    |     6136 |       7274 |   +19% |  6867 |   +12% |
> > +-------------------+----------+------------+--------+-------+--------+
> > | Web Browsing      |     5558 |       6273 |   +13% |  6035 |    +9% |
> > +-------------------+----------+------------+--------+-------+--------+
> > | Video Editing     |     4921 |       5221 |    +6% |  5167 |    +5% |
> > +-------------------+----------+------------+--------+-------+--------+
> > | Writing           |     6864 |       8825 |   +29% |  8529 |   +24% |
> > +-------------------+----------+------------+--------+-------+--------+
> > | Photo Editing     |     7983 |      11593 |   +45% | 10812 |   +35% |
> > +-------------------+----------+------------+--------+-------+--------+
> > | Data Manipulation |     5814 |       6081 |    +5% |  5327 |    -8% |
> > +-------------------+----------+------------+--------+-------+--------+
> > 
> > PCMark Performance/mAh
> > Higher is better
> > +-----------+----------+-----------+--------+------+--------+
> > |           | Baseline | Hypercall | %delta | MMIO | %delta |
> > +-----------+----------+-----------+--------+------+--------+
> > | Score/mAh |       79 |        88 |   +11% |   83 |    +7% |
> > +-----------+----------+-----------+--------+------+--------+
> > 
> > Roblox
> > Higher is better
> > +-----+----------+------------+--------+-------+--------+
> > |     | Baseline |  Hypercall | %delta |  MMIO | %delta |
> > +-----+----------+------------+--------+-------+--------+
> > | FPS |    18.25 |      28.66 |   +57% | 24.06 |   +32% |
> > +-----+----------+------------+--------+-------+--------+
> > 
> > Roblox Frames/mAh
> > Higher is better
> > +------------+----------+------------+--------+--------+--------+
> > |            | Baseline |  Hypercall | %delta |   MMIO | %delta |
> > +------------+----------+------------+--------+--------+--------+
> > | Frames/mAh |    91.25 |     114.64 |   +26% | 103.11 |   +13% |
> > +------------+----------+------------+--------+--------+--------+
> 
> </snip>
> 
> > Next steps:
> > ===========
> > We are continuing to look into communication mechanisms other than
> > hypercalls that are just as/more efficient and avoid switching into the VMM
> > userspace. Any inputs in this regard are greatly appreciated.
> 
> We're highly unlikely to entertain such an interface in KVM.
> 
> The entire feature is dependent on pinning vCPUs to physical cores, for which
> userspace is in the driver's seat. That is a well established and documented
> policy which can be seen in the way we handle heterogeneous systems and
> vPMU.
> 
> Additionally, this bloats the KVM PV ABI with highly VMM-dependent interfaces
> that I would not expect to benefit the typical user of KVM.
> 
> Based on the data above, it would appear that the userspace implementation is
> in the same neighborhood as a KVM-based implementation, which only further
> weakens the case for moving this into the kernel.
> 
> I certainly can appreciate the motivation for the series, but this feature
> should be in userspace as some form of a virtual device.

+1 on all of the above.

The one thing I'd like to understand that the comment seems to imply
that there is a significant difference in overhead between a hypercall
and an MMIO. In my experience, both are pretty similar in cost for a
handling location (both in userspace or both in the kernel). MMIO
handling is a tiny bit more expensive due to a guaranteed TLB miss
followed by a walk of the in-kernel device ranges, but that's all. It
should hardly register.

And if you really want some super-low latency, low overhead
signalling, maybe an exception is the wrong tool for the job. Shared
memory communication could be more appropriate.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list