[PATCH] arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()

Will Deacon will at kernel.org
Wed Sep 7 09:25:43 PDT 2022


On Wed, Sep 07, 2022 at 10:27:45AM +0100, Robin Murphy wrote:
> On 2022-09-07 10:03, Christoph Hellwig wrote:
> > On Tue, Aug 23, 2022 at 01:21:11PM +0100, Will Deacon wrote:
> > > arch_dma_prep_coherent() is called when preparing a non-cacheable region
> > > for a consistent DMA buffer allocation. Since the buffer pages may
> > > previously have been written via a cacheable mapping and consequently
> > > allocated as dirty cachelines, the purpose of this function is to remove
> > > these dirty lines from the cache, writing them back so that the
> > > non-coherent device is able to see them.
> > 
> > Yes.
> > 
> > > I'm slightly wary about this change as other architectures seem to do
> > > clean+invalidate here, but I'd like to hear what others think in any
> > > case.
> > 
> > If arm64 is fine with having clean but present cachelines when creating
> > an uncached mapping for a cache line, the invalidate is not required.
> > 
> > But isn't it better for the cache if these by definition useless
> > cachelines get evicted?
> > 
> > My biggest concern here is that we're now moving from consolidating
> > these semantics in all the different architectures to different ones,
> > making a centralization of the policies even harder.
> 
> FWIW I agree with Ard in not being entirely confident with this change. The
> impression I had (which may be wrong) was that the architecture never
> actually ruled out unexpected cache hits in the case of mismatched
> attributes, it just quietly stopped mentioning it at all. And even if the
> architecture did rule them out, how confident are we about errata that might
> still allow them to happen?

The architecture does rule this out as long as the cacheable alias is not
dirty (i.e. has not been written to). If this was not the case, then our
non-cacheable DMA buffers would be unreliable because speculative fills
from the linear map could "mask" the actual DMA data. The presence of the
linear map also means that the lines could immediately be refetched by
speculation, rendering the invalidation pointless in the current code.

Regardless of this patch, a CPU erratum violating the architectural
guarantee would be broken. It is the presence of the linear alias that
would need to be resolved.

The architecture text is buried in the infamous "B2.8 Mismatched memory
attributes" section of the Arm ARM, but it's also worth talking to Mark R,
Catalin and the handful of folks in ATG who grok this to get a summary
which is easier to digest.

> It seems like we don't stand to gain much by removing the invalidation -
> since the overhead will still be in the clean - other than the potential for
> a slightly increased chance of rare and hard-to-debug memory corruption :/

I just find it odd that we rely on the CPU not hitting the cacheable alias
in other places, yet we're using an invalidation for this path. It's
inconsistent and difficult to explain to people. As I said, I'm happy to
add a comment to the existing code instead of the change here, but I don't
know what to say other than something like:

  /*
   * The architecture says we only need a clean here, but invalidate as
   * well just in case.
   */

Which feels deeply unsatisfactory.

Will



More information about the linux-arm-kernel mailing list