[PATCH] arm64: Enable PCI write-combine resources under sysfs

Lorenzo Pieralisi lorenzo.pieralisi at arm.com
Thu Sep 17 06:28:19 EDT 2020


On Thu, Sep 17, 2020 at 09:59:28AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2020-09-16 at 09:12 -0300, Jason Gunthorpe wrote:
> > > Also we could make this a variable rather than a constant and
> > > choose
> > > a more appropriate set of flags at boot time....
> > 
> > It is a function, so it could check the CPU ID for the known broken
> > devices and block them.
> 
> Sure, I meant in the abstract way. It's not a hot path so it doesnt
> have to be a static key.
> 
> > > > > Why would that be a regression ? 
> > > > 
> > > > Using the WC submission flow when it doesn't work costs something
> > > > like
> > > > 10% performance vs using the non-WC flow.
> > > 
> > > You mean the driver uses a different path to the HW which ahs that
> > > overhead, not that MMIOs have that overhead right ?
> > 
> > The different path has overhead of doing extra useless MMIOs because
> > they don't combine
> 
> I see. This might have to end up being a TX2 specific hack until the
> end of times...

True - hopefully on platforms that implement normal NC the architectural
way will not trigger user space performance regressions.

Unfortunately if we merge this patch we _do_ know from this thread
that userspace will suffer from a perf regression on TX2.

Either we ignore it or we write some code to prevent it
(ie first step make arch_can_pci_mmap_wc() return 0 on TX2 -
possibly using the arm64 errata detection mechanism).

Adding a new IO mapping API and use it in IB drivers won't solve the TX2
problem - since we still prefer normal NC over device GRE for "WC"
mappings and we would have to "downgrade" TX2 somehow.

Lorenzo



More information about the linux-arm-kernel mailing list