[PATCH] arm64: Enable PCI write-combine resources under sysfs

Benjamin Herrenschmidt benh at kernel.crashing.org
Thu Sep 10 17:46:47 EDT 2020

On Thu, 2020-09-10 at 14:10 -0300, Jason Gunthorpe wrote:
> Can you explain what this actually does on ARM? 
> Can it ever speculate loads across page boundaries, or speculate
> loads
> that never exist in the program? ie will we get random unpredicable
> MemRds?

Probably, at least on powerpc you will as well, that's the only way to
get write combine.

> Does it/could it "combine writes"?

I assume so for ARM, definitely for powerpc.

> > > If the CPU fails to generate a 64 byte TLP then the device will
> > > still
> > > operate correctly but does a different, slower, flow.
> > 
> > Side note: on ARM that TLP is not a native interconnect
> > transaction,
> > reworded, it depends on what the system-bus->PCI logic does in
> > this respect.
> I think the issue is that ARM never defined what the bits set by
> pgprot_writecombine() do at a system level so we see implementations
> that do not cause write combining at the PCI-E interface for those
> bits. (I assume from what I've heard)

Nobody did. I think only x86 has a real "write combine" attribute. I
tried to untangled that mess years ago and didnt' get to the bottom of
it, but basically, on non-x86 archs, pgprot_writecombine will give you
what you asked ... and more.

> > That's why I looped you in - that's what worries me about
> > "enabling"
> > arch_can_pci_mmap_wc() on arm64. If we enable it and we have perf
> > regressions that's not OK.
> > 
> > Or we *can* enable arch_can_pci_mmap_wc() but force the mellanox
> > driver (or more broadly all drivers following this message push
> > semantics) to use "something else" for WC detection.
> arch_can_pci_mmap_wc() really only controls the sysfs resource file
> and it seems very unclear who in userspace uses that these days.

dpdk under some circumstances afaik.

> vfio is now the right way to do that stuff. I don't see an obvious
> way to get WC memory in VFIO though...

Which would be a performance issue on a number of things I suppose...


