[PATCH] arm64: Enable PCI write-combine resources under sysfs
Lorenzo Pieralisi
lorenzo.pieralisi at arm.com
Thu Sep 17 06:28:19 EDT 2020
On Thu, Sep 17, 2020 at 09:59:28AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2020-09-16 at 09:12 -0300, Jason Gunthorpe wrote:
> > > Also we could make this a variable rather than a constant and
> > > choose
> > > a more appropriate set of flags at boot time....
> >
> > It is a function, so it could check the CPU ID for the known broken
> > devices and block them.
>
> Sure, I meant in the abstract way. It's not a hot path so it doesnt
> have to be a static key.
>
> > > > > Why would that be a regression ?
> > > >
> > > > Using the WC submission flow when it doesn't work costs something
> > > > like
> > > > 10% performance vs using the non-WC flow.
> > >
> > > You mean the driver uses a different path to the HW which ahs that
> > > overhead, not that MMIOs have that overhead right ?
> >
> > The different path has overhead of doing extra useless MMIOs because
> > they don't combine
>
> I see. This might have to end up being a TX2 specific hack until the
> end of times...
True - hopefully on platforms that implement normal NC the architectural
way will not trigger user space performance regressions.
Unfortunately if we merge this patch we _do_ know from this thread
that userspace will suffer from a perf regression on TX2.
Either we ignore it or we write some code to prevent it
(ie first step make arch_can_pci_mmap_wc() return 0 on TX2 -
possibly using the arm64 errata detection mechanism).
Adding a new IO mapping API and use it in IB drivers won't solve the TX2
problem - since we still prefer normal NC over device GRE for "WC"
mappings and we would have to "downgrade" TX2 somehow.
Lorenzo
More information about the linux-arm-kernel
mailing list