[PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64

Wed Jan 24 05:32:22 PST 2024

On Wed, 24 Jan 2024 13:06:38 +0000,
Jason Gunthorpe <jgg at nvidia.com> wrote:
> 
> On Wed, Jan 24, 2024 at 08:26:28AM +0000, Marc Zyngier wrote:
> 
> > > Even if you refuse to take STP to mainline it *will* be running in VMs
> > > under ARM hypervisors.
> > 
> > A hypervisor can't do anything with it. If you cared to read the
> > architecture, you'd know by now. So your VM will be either dead, or
> > dog slow, depending on your hypervisor. In any case, I'm sure it will
> > reflect positively on your favourite software.
> 
> "Dog slow" is fine. Forcing IO emulation on paths that shouldn't have
> it is a VMM problem. KVM & qemu have some issues where this can happen
> infrequently for VFIO MMIO maps. It is just important that it be
> functionally correct if you get unlucky. The performance path is to
> not take a fault in the first place.
> 
> > > What exactly do you think should be done about that?
> > 
> > Well, you could use KVM_CAP_ARM_NISV_TO_USER in userspace and see
> > everything slow down. Your call.
> 
> The issue Mark raised here was that things like STP/etc cannot work in
> VMs, not that they are slow.
>
> The places we are talking about using the STP pattern are all high
> performance HW drivers, that do not have any existing SW emulation to
> worry about. ie the VMM will be using VFIO to back the MMIO the
> acessors target.
> 
> So, I'm fine if the answer is that VMM's using VFIO need to use
> KVM_CAP_ARM_NISV_TO_USER and do instruction parsing for emulated IO in
> userspace if they have a design where VFIO MMIO can infrequently
> generate faults. That is all VMM design stuff and has nothing to do
> with the kernel.

Which will work a treat with things like CCA, I'm sure.

> 
> My objection is this notion we should degrade a performance hot path
> in drivers to accomodate an ARM VMM issue that should be solved in the
> VMM.
> 
> > Or you can stop whining and try to get better performance out of what
> > we have today.
> 
> "better performance"!?!? You are telling me I have to destroy one of
> our important fast paths for HPC workloads to accommodate some
> theoretical ARM KVM problem?

What I'm saying is that there are way to make it better without
breaking your particular toy workload which, as important as it may be
to *you*, doesn't cover everybody's use case.

Mark did post such an example that has the potential of having that
improvement. I'd suggest that you give it a go.

But your attitude of "who cares if it breaks as long as it works for
me" is not something I can adhere to.

	M.

-- 
Without deviation from the norm, progress is not possible.