[PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Marc Zyngier
maz at kernel.org
Wed Jan 24 05:32:22 PST 2024
On Wed, 24 Jan 2024 13:06:38 +0000,
Jason Gunthorpe <jgg at nvidia.com> wrote:
>
> On Wed, Jan 24, 2024 at 08:26:28AM +0000, Marc Zyngier wrote:
>
> > > Even if you refuse to take STP to mainline it *will* be running in VMs
> > > under ARM hypervisors.
> >
> > A hypervisor can't do anything with it. If you cared to read the
> > architecture, you'd know by now. So your VM will be either dead, or
> > dog slow, depending on your hypervisor. In any case, I'm sure it will
> > reflect positively on your favourite software.
>
> "Dog slow" is fine. Forcing IO emulation on paths that shouldn't have
> it is a VMM problem. KVM & qemu have some issues where this can happen
> infrequently for VFIO MMIO maps. It is just important that it be
> functionally correct if you get unlucky. The performance path is to
> not take a fault in the first place.
>
> > > What exactly do you think should be done about that?
> >
> > Well, you could use KVM_CAP_ARM_NISV_TO_USER in userspace and see
> > everything slow down. Your call.
>
> The issue Mark raised here was that things like STP/etc cannot work in
> VMs, not that they are slow.
>
> The places we are talking about using the STP pattern are all high
> performance HW drivers, that do not have any existing SW emulation to
> worry about. ie the VMM will be using VFIO to back the MMIO the
> acessors target.
>
> So, I'm fine if the answer is that VMM's using VFIO need to use
> KVM_CAP_ARM_NISV_TO_USER and do instruction parsing for emulated IO in
> userspace if they have a design where VFIO MMIO can infrequently
> generate faults. That is all VMM design stuff and has nothing to do
> with the kernel.
Which will work a treat with things like CCA, I'm sure.
>
> My objection is this notion we should degrade a performance hot path
> in drivers to accomodate an ARM VMM issue that should be solved in the
> VMM.
>
> > Or you can stop whining and try to get better performance out of what
> > we have today.
>
> "better performance"!?!? You are telling me I have to destroy one of
> our important fast paths for HPC workloads to accommodate some
> theoretical ARM KVM problem?
What I'm saying is that there are way to make it better without
breaking your particular toy workload which, as important as it may be
to *you*, doesn't cover everybody's use case.
Mark did post such an example that has the potential of having that
improvement. I'd suggest that you give it a go.
But your attitude of "who cares if it breaks as long as it works for
me" is not something I can adhere to.
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list