[PATCH] arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()

Wed Sep 7 10:50:20 PDT 2022

On 2022-09-07 17:25, Will Deacon wrote:
> On Wed, Sep 07, 2022 at 10:27:45AM +0100, Robin Murphy wrote:
>> On 2022-09-07 10:03, Christoph Hellwig wrote:
>>> On Tue, Aug 23, 2022 at 01:21:11PM +0100, Will Deacon wrote:
>>>> arch_dma_prep_coherent() is called when preparing a non-cacheable region
>>>> for a consistent DMA buffer allocation. Since the buffer pages may
>>>> previously have been written via a cacheable mapping and consequently
>>>> allocated as dirty cachelines, the purpose of this function is to remove
>>>> these dirty lines from the cache, writing them back so that the
>>>> non-coherent device is able to see them.
>>>
>>> Yes.
>>>
>>>> I'm slightly wary about this change as other architectures seem to do
>>>> clean+invalidate here, but I'd like to hear what others think in any
>>>> case.
>>>
>>> If arm64 is fine with having clean but present cachelines when creating
>>> an uncached mapping for a cache line, the invalidate is not required.
>>>
>>> But isn't it better for the cache if these by definition useless
>>> cachelines get evicted?
>>>
>>> My biggest concern here is that we're now moving from consolidating
>>> these semantics in all the different architectures to different ones,
>>> making a centralization of the policies even harder.
>>
>> FWIW I agree with Ard in not being entirely confident with this change. The
>> impression I had (which may be wrong) was that the architecture never
>> actually ruled out unexpected cache hits in the case of mismatched
>> attributes, it just quietly stopped mentioning it at all. And even if the
>> architecture did rule them out, how confident are we about errata that might
>> still allow them to happen?
> 
> The architecture does rule this out as long as the cacheable alias is not
> dirty (i.e. has not been written to). If this was not the case, then our
> non-cacheable DMA buffers would be unreliable because speculative fills
> from the linear map could "mask" the actual DMA data. The presence of the
> linear map also means that the lines could immediately be refetched by
> speculation, rendering the invalidation pointless in the current code.
> 
> Regardless of this patch, a CPU erratum violating the architectural
> guarantee would be broken. It is the presence of the linear alias that
> would need to be resolved.

Oh have I got a treat for you... but that's not relevant here ;)

> The architecture text is buried in the infamous "B2.8 Mismatched memory
> attributes" section of the Arm ARM, but it's also worth talking to Mark R,
> Catalin and the handful of folks in ATG who grok this to get a summary
> which is easier to digest.

I'm looking at DDI0487I.a and nothing stands out as different from 
previous readings. In particular,

"A read of the memory Location by one agent might not return the value 
most recently written to that memory Location by the same agent"

seems entirely consistent with non-cacheable reads being allowed to 
unexpectedly hit clean (but stale) cache entries even if non-cacheable 
writes don't. And frankly I don't see how

"There might be a loss of coherency when multiple agents attempt to 
access a memory Location."

helps rule *anything* out. The bit about writeback does seem to imply 
that non-cacheable writes can't dirty a cache entry, so I can at least 
accept that unexpected hits for writes are off the table. Furthermore, 
the bit about mismatched shareability - which I believe applies here 
since our cacheable mappings are ISh while non-cacheable is inherently 
OSh - explicitly mentions:

"Software running on a PE cleans and invalidates a Location from cache 
before and after each read or write to that Location by that PE."

If the architecture saying to clean and invalidate is supposed to be a 
guarantee that invalidation is unnecessary, I give up.

>> It seems like we don't stand to gain much by removing the invalidation -
>> since the overhead will still be in the clean - other than the potential for
>> a slightly increased chance of rare and hard-to-debug memory corruption :/
> 
> I just find it odd that we rely on the CPU not hitting the cacheable alias
> in other places, yet we're using an invalidation for this path. It's
> inconsistent and difficult to explain to people. As I said, I'm happy to
> add a comment to the existing code instead of the change here, but I don't
> know what to say other than something like:
> 
>    /*
>     * The architecture says we only need a clean here, but invalidate as
>     * well just in case.
>     */
> 
> Which feels deeply unsatisfactory.

Well, the chances of the alias being speculated back into caches by 
access to nearby linear map pages still seem a lot smaller than the 
chances of caches hanging on to lines we've just finished accessing with 
cacheable attributes. Christoph makes a good point that at worst it's a 
concise hint for caches with LRU replacement policy that might otherwise 
think we could be coming back for our freshly-written clean zeros.

Cheers,
Robin.