[PATCH] arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()

Thu Sep 8 04:32:42 PDT 2022

On 2022-09-08 11:32, Catalin Marinas wrote:
> On Wed, Sep 07, 2022 at 05:25:43PM +0100, Will Deacon wrote:
>> On Wed, Sep 07, 2022 at 10:27:45AM +0100, Robin Murphy wrote:
>>> On 2022-09-07 10:03, Christoph Hellwig wrote:
>>>> On Tue, Aug 23, 2022 at 01:21:11PM +0100, Will Deacon wrote:
>>>>> arch_dma_prep_coherent() is called when preparing a non-cacheable region
>>>>> for a consistent DMA buffer allocation. Since the buffer pages may
>>>>> previously have been written via a cacheable mapping and consequently
>>>>> allocated as dirty cachelines, the purpose of this function is to remove
>>>>> these dirty lines from the cache, writing them back so that the
>>>>> non-coherent device is able to see them.
>>>>
>>>> Yes.
>>>>
>>>>> I'm slightly wary about this change as other architectures seem to do
>>>>> clean+invalidate here, but I'd like to hear what others think in any
>>>>> case.
>>>>
>>>> If arm64 is fine with having clean but present cachelines when creating
>>>> an uncached mapping for a cache line, the invalidate is not required.
>>>>
>>>> But isn't it better for the cache if these by definition useless
>>>> cachelines get evicted?
>>>>
>>>> My biggest concern here is that we're now moving from consolidating
>>>> these semantics in all the different architectures to different ones,
>>>> making a centralization of the policies even harder.
>>>
>>> FWIW I agree with Ard in not being entirely confident with this change. The
>>> impression I had (which may be wrong) was that the architecture never
>>> actually ruled out unexpected cache hits in the case of mismatched
>>> attributes, it just quietly stopped mentioning it at all. And even if the
>>> architecture did rule them out, how confident are we about errata that might
>>> still allow them to happen?
> 
> For the erratum you have in mind, invalidation doesn't help either (as
> you know already).
> 
> The architecture does rule out unexpected cache hits, though not by
> stating this explicitly but rather only listing the cases where one
> needs cache maintenance in the presence of mismatched aliases. Such
> unexpected cache hits may happen micro-architecturally but are not
> visible to software (in theory and ignoring timing side-channels).
> 
>>> It seems like we don't stand to gain much by removing the invalidation -
>>> since the overhead will still be in the clean - other than the potential for
>>> a slightly increased chance of rare and hard-to-debug memory corruption :/
>>
>> I just find it odd that we rely on the CPU not hitting the cacheable alias
>> in other places, yet we're using an invalidation for this path. It's
>> inconsistent and difficult to explain to people. As I said, I'm happy to
>> add a comment to the existing code instead of the change here, but I don't
>> know what to say other than something like:
>>
>>    /*
>>     * The architecture says we only need a clean here, but invalidate as
>>     * well just in case.
>>     */
> 
> The architecture requires invalidate or clean while also stating that
> clean+invalidate can be used instead, so I don't think there's much to
> justify. From the mismatched attributes section:
> 
> 1. If the mismatched attributes for a memory location all assign the
>     same shareability attribute to a Location that has a cacheable
>     attribute,

This is not describing our case, though. We do have a cacheable 
attribute in at least one alias, but the shareability is *not* the same 
across all aliases, therefore at face value it clearly does not apply.

If it's intended to mean that the shareability of all aliases with 
cacheable attributes is the same and the shareability of aliases without 
cacheable attributes doesn't matter, then it should bloody well say 
that, not something entirely contradictory :(

I'm basically just shaking my fists at the sky here, but it irks me no 
end to be writing patches based on what half a dozen people (myself 
included, in some cases) happen to know that an architecture is supposed 
to mean, rather than what it actually says.

Cheers,
Robin.

> any loss of uniprocessor semantics, ordering, or coherency
>     within a shareability domain can be avoided by use of software cache
>     management. To do so, software must use the techniques that are
>     required for the software management of the ordering or coherency of
>     cacheable Locations between agents in different shareability domains.
>     This means:
>     - Before writing to a cacheable Location not using the Write-Back
>       attribute, software must invalidate, or clean, a Location from the
>       caches if any agent might have written to the Location with the
>       Write-Back attribute. This avoids the possibility of overwriting
>       the Location with stale data.
>     - After writing to a cacheable Location with the Write-Back
>       attribute, software must clean the Location from the caches, to
>       make the write visible to external memory.
>     ...
> 
> When creating the non-cacheable alias for DMA, what we need is the first
> bullet point above. A clean would be required if we wrote something via
> the write-back mapping and we want that data to be visible (not needed
> if the zeroing is done via the non-cacheable alias).
> 
> However, it just crossed my mind that we can't always drop the
> CPU-written (meta)data with an invalidate even if the zeroing is done on
> the non-cacheable alias. With MTE+kasan, the page allocator may colour
> the memory (via the cacheable alias). Even if non-cacheable accesses are
> not checked by MTE, when freeing it back it could potentially confuse
> the allocator if the tags were lost. Similarly for the streaming DMA and
> this could have been worse but luckily we had c50f11c6196f ("arm64: mm:
> Don't invalidate FROM_DEVICE buffers at start of DMA transfer") for
> other reasons.
> 
> So yeah, I think a clean is needed here or clean+invalidate but not
> invalidate only due to the addition of MTE.
>