[PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper
Arnd Bergmann
arnd at arndb.de
Fri Mar 31 10:20:10 PDT 2023
On Fri, Mar 31, 2023, at 17:12, Robin Murphy wrote:
> On 31/03/2023 3:00 pm, Arnd Bergmann wrote:
>> On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote:
>>
>> To be on the safe side, I'd have to pass a flag into
>> arch_dma_mark_clean() about coherency, to let the arm
>> implementation still require the extra dcache flush
>> for coherent DMA, while ia64 can ignore that flag.
>
> Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA
> write should be pretty much equivalent to a coherent write by another
> CPU (or indeed the local CPU itself) - nothing says that it *couldn't*
> dirty a line in a data cache above the level of unification, so in
> general the assumption must be that, yes, if coherent DMA is writing
> data intended to be executable, then it's going to want a Dcache clean
> to PoU and an Icache invalidate to PoU before trying to execute it. By
> comparison, a non-coherent DMA transfer will inherently have to
> invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot
> leave dirty data above the PoU, so only the Icache maintenance is
> required in the executable case.
Ok, makes sense. I've already started reworking my patch for it.
> (FWIW I believe the Armv8 IDC/DIC features can safely be considered
> irrelevant to 32-bit kernels)
>
> I don't know a great deal about IA-64, but it appears to be using its
> PG_arch_1 flag in a subtly different manner to Arm, namely to optimise
> out the *Icache* maintenance. So if anything, it seems IA-64 is the
> weirdo here (who'd have guessed?) where DMA manages to be *more*
> coherent than the CPUs themselves :)
I checked this in the ia64 manual, and as far as I can tell, it originally
only had one cacheflush instruction that flushes the dcache and invalidates
the icache at the same time. So flush_icache_range() actually does
both and flush_dcache_page() instead just marks the page as dirty to
ensure flush_icache_range() does not get skipped after a writing a
page from the kernel.
On later Itaniums, there is apparently a separate icache flush
instruction that gets used in flush_icache_range(), but that
still works for the DMA case that is allowed to skip the flush.
> This is all now making me think we need some careful consideration of
> whether the benefits of consolidating code outweigh the confusion of
> conflating multiple different meanings of "clean" together...
The difference in usage of PG_dcache_clean/PG_dcache_dirty/PG_arch_1
across architectures is certainly big enough that we can't just
define a a common arch_dma_mark_clean() across architectures, but
I think the idea of having a common entry point for
arch_dma_mark_clean() to be called from the dma-mapping code
to do something architecture specific after a DMA is clean still
makes sense,
Arnd
More information about the linux-arm-kernel
mailing list