[PATCH] arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()

Thu Sep 8 03:32:15 PDT 2022

On Wed, Sep 07, 2022 at 05:25:43PM +0100, Will Deacon wrote:
> On Wed, Sep 07, 2022 at 10:27:45AM +0100, Robin Murphy wrote:
> > On 2022-09-07 10:03, Christoph Hellwig wrote:
> > > On Tue, Aug 23, 2022 at 01:21:11PM +0100, Will Deacon wrote:
> > > > arch_dma_prep_coherent() is called when preparing a non-cacheable region
> > > > for a consistent DMA buffer allocation. Since the buffer pages may
> > > > previously have been written via a cacheable mapping and consequently
> > > > allocated as dirty cachelines, the purpose of this function is to remove
> > > > these dirty lines from the cache, writing them back so that the
> > > > non-coherent device is able to see them.
> > > 
> > > Yes.
> > > 
> > > > I'm slightly wary about this change as other architectures seem to do
> > > > clean+invalidate here, but I'd like to hear what others think in any
> > > > case.
> > > 
> > > If arm64 is fine with having clean but present cachelines when creating
> > > an uncached mapping for a cache line, the invalidate is not required.
> > > 
> > > But isn't it better for the cache if these by definition useless
> > > cachelines get evicted?
> > > 
> > > My biggest concern here is that we're now moving from consolidating
> > > these semantics in all the different architectures to different ones,
> > > making a centralization of the policies even harder.
> > 
> > FWIW I agree with Ard in not being entirely confident with this change. The
> > impression I had (which may be wrong) was that the architecture never
> > actually ruled out unexpected cache hits in the case of mismatched
> > attributes, it just quietly stopped mentioning it at all. And even if the
> > architecture did rule them out, how confident are we about errata that might
> > still allow them to happen?

For the erratum you have in mind, invalidation doesn't help either (as
you know already).

The architecture does rule out unexpected cache hits, though not by
stating this explicitly but rather only listing the cases where one
needs cache maintenance in the presence of mismatched aliases. Such
unexpected cache hits may happen micro-architecturally but are not
visible to software (in theory and ignoring timing side-channels).

> > It seems like we don't stand to gain much by removing the invalidation -
> > since the overhead will still be in the clean - other than the potential for
> > a slightly increased chance of rare and hard-to-debug memory corruption :/
> 
> I just find it odd that we rely on the CPU not hitting the cacheable alias
> in other places, yet we're using an invalidation for this path. It's
> inconsistent and difficult to explain to people. As I said, I'm happy to
> add a comment to the existing code instead of the change here, but I don't
> know what to say other than something like:
> 
>   /*
>    * The architecture says we only need a clean here, but invalidate as
>    * well just in case.
>    */

The architecture requires invalidate or clean while also stating that
clean+invalidate can be used instead, so I don't think there's much to
justify. From the mismatched attributes section:

1. If the mismatched attributes for a memory location all assign the
   same shareability attribute to a Location that has a cacheable
   attribute, any loss of uniprocessor semantics, ordering, or coherency
   within a shareability domain can be avoided by use of software cache
   management. To do so, software must use the techniques that are
   required for the software management of the ordering or coherency of
   cacheable Locations between agents in different shareability domains.
   This means:
   - Before writing to a cacheable Location not using the Write-Back
     attribute, software must invalidate, or clean, a Location from the
     caches if any agent might have written to the Location with the
     Write-Back attribute. This avoids the possibility of overwriting
     the Location with stale data.
   - After writing to a cacheable Location with the Write-Back
     attribute, software must clean the Location from the caches, to
     make the write visible to external memory.
   ...

When creating the non-cacheable alias for DMA, what we need is the first
bullet point above. A clean would be required if we wrote something via
the write-back mapping and we want that data to be visible (not needed
if the zeroing is done via the non-cacheable alias).

However, it just crossed my mind that we can't always drop the
CPU-written (meta)data with an invalidate even if the zeroing is done on
the non-cacheable alias. With MTE+kasan, the page allocator may colour
the memory (via the cacheable alias). Even if non-cacheable accesses are
not checked by MTE, when freeing it back it could potentially confuse
the allocator if the tags were lost. Similarly for the streaming DMA and
this could have been worse but luckily we had c50f11c6196f ("arm64: mm:
Don't invalidate FROM_DEVICE buffers at start of DMA transfer") for
other reasons.

So yeah, I think a clean is needed here or clean+invalidate but not
invalidate only due to the addition of MTE.

-- 
Catalin