[PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64

Jason Gunthorpe jgg at nvidia.com
Tue Jan 23 17:27:23 PST 2024


On Tue, Jan 23, 2024 at 08:38:55PM +0000, Catalin Marinas wrote:
> (fixed Marc's email address)
> 
> On Wed, Jan 17, 2024 at 01:29:06PM +0000, Mark Rutland wrote:
> > On Wed, Jan 17, 2024 at 08:36:18AM -0400, Jason Gunthorpe wrote:
> > > On Wed, Jan 17, 2024 at 12:30:00PM +0000, Mark Rutland wrote:
> > > > On Tue, Jan 16, 2024 at 02:51:21PM -0400, Jason Gunthorpe wrote:
> > > > > I'm just revising this and I'm wondering if you know why ARM64 has this:
> > > > > 
> > > > > #define __raw_writeq __raw_writeq
> > > > > static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
> > > > > {
> > > > > 	asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
> > > > > }
> > > > > 
> > > > > Instead of
> > > > > 
> > > > > #define __raw_writeq __raw_writeq
> > > > > static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
> > > > > {
> > > > > 	asm volatile("str %x0, %1" : : "rZ" (val), "m" (*(volatile u64 *)addr));
> > > > > }
> > > > > 
> > > > > ?? Like x86 has.
> > > > 
> > > > I believe this is for the same reason as doing so in all of our other IO
> > > > accessors.
> > > > 
> > > > We've deliberately ensured that our IO accessors use a single base register
> > > > with no offset as this is the only form that HW can represent in ESR_ELx.ISS.SRT
> > > > when reporting a stage-2 abort, which a hypervisor may use for
> > > > emulating IO.
> > > 
> > > Wow, harming bare metal performace to accommodate imperfect emulation
> > > sounds like a horrible reason :(
> > 
> > Having working functionality everywhere is a very good reason. :)
> > 
> > > So what happens with this patch where IO is done with STP? Are you
> > > going to tell me I can't do it because of this?
> > 
> > I'm not personally going to make that judgement, but it's certainly something
> > for Catalin and Will to consider (and I've added Marc in case he has any
> > opinion).
> 
> Good point, I missed this part. We definitely can't use STP in the I/O
> accessors, we'd have a big surprise when running the same code in a
> guest with emulated I/O.

Unfortunately there is no hard distinction in KVM/qemu for "emulated
IO" and "VFIO MMIO". Even devices using VFIO can get funneled down the
emulated path for legitimate reasons.

Again, userspace is already widely deployed using complex IO
accessors. ST4 has been out there for years and at this moment this
patch with STP is already being deployed in production environments.

Even if you refuse to take STP to mainline it *will* be running in VMs
under ARM hypervisors.

What exactly do you think should be done about that?

I thought the guiding mantra here was that any time KVM does not
perfectly emulate bare metal it is a bug. "We can't assume all VMs are
Linux!". Indeed we recently had some long and *very* theoretical
discussions about possible incompatibilties due to kvm changes in the
memory attributes thread.

But here it seems to be just shrugging off something so catastrophic
as performance IO accessors *that are widely deployed already* don't
work reliably in VMs!?!?

"Oh well, don't use them"!?

Damn I hope it crashes the VM and doesn't corrupt the MMIO. I just
debugged a x86 KVM issue with it corrupting VFIO MMIO and that was a
total nightmare to find.

> If eight STRs without other operations interleaved give us the
> write-combining on most CPUs (with Normal NC), we should go with this
> instead of STP.

__iowrite64_copy() is a performance IO accessor, we should not degrade
it because buggy hypervisors might exist that have a problem with STP
or other instructions. :( :(

Anyhow, I know nothing about whatever this issue is - Mark said:

 > FWIW, IIUC the immediate-offset forms *without* writeback can still
 > be reported usefully in ESR_ELx,

Which excludes the post/pre increment forms - but does STP and ST4
also have some kind of problem because the emulation path can't know
about wider than a 64 bit access?

What is the plan for ST64B? Don't get to use that either?

Jason 



More information about the linux-arm-kernel mailing list