[PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA

Sat May 6 00:25:41 PDT 2023

On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd at arndb.de> wrote:
>
> On Fri, May 5, 2023, at 07:47, Guo Ren wrote:
> > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd at kernel.org> wrote:
>
> >>
> >> riscv also invalidates the caches before the transfer, which does
> >> not appear to serve any purpose.
> > Yes, we can't guarantee the CPU pre-load cache lines randomly during
> > dma working.
> >
> > But I've two purposes to keep invalidates before dma transfer:
> >  - We clearly tell the CPU these cache lines are invalid. The caching
> > algorithm would use these invalid slots first instead of replacing
> > valid ones.
> >  - Invalidating is very cheap. Actually, flush and clean have the same
> > performance in our machine.
>
> The main purpose of the series was to get consistent behavior on
> all machines, so I really don't want a custom optimization on
> one architecture. You make a good point about cacheline reuse
> after invalidation, but if we do that, I'd suggest doing this
> across all architectures.
Yes, invalidation of DMA_FROM_DEVICE-for_device is a proposal for all
architectures.

>
> > So, how about:
> >
> > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > index d919efab6eba..2c52fbc15064 100644
> > --- a/arch/riscv/mm/dma-noncoherent.c
> > +++ b/arch/riscv/mm/dma-noncoherent.c
> > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
> >                 ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
> >                 break;
> >         case DMA_FROM_DEVICE:
> > -               ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
> > -               break;
> >         case DMA_BIDIRECTIONAL:
> >                 ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
> >                 break;
>
> This is something we can consider. Unfortunately, this is something
> that no architecture (except pa-risc, which has other problems)
> does at the moment, so we'd probably need to have a proper debate
> about this.
>
> We already have two conflicting ways to handle DMA_FROM_DEVICE,
> either invalidate/invalidate, or clean/invalidate. I can see
I vote to invalidate/invalidate.

My key point is to let DMA_FROM_DEVICE-for_device invalidate, and
DMA_BIDIRECTIONAL contains DMA_FROM_DEVICE.
So I also agree:
@@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
                 ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
                 break;
         case DMA_FROM_DEVICE:
 -               ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
 +              ALT_CMO_OP(invalidate, vaddr, size, riscv_cbom_block_size);
                 break;
         case DMA_BIDIRECTIONAL:
                 ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
                 break;

> that flush/invalidate may be a sensible option as well, but I'd
> want to have that discussion after the series is complete, so
> we can come to a generic solution that has the same documented
> behavior across all architectures.
Yes, I agree to unify them into a generic solution first. My proposal
could be another topic in the future.
For that purpose, I give
Acked-by: Guo Ren <guoren at kernel.org>

>
> In particular, if we end up moving arm64 and riscv back to the
> traditional invalidate/invalidate for DMA_FROM_DEVICE and
> document that driver must not rely on buffers getting cleaned
After invalidation, the cache lines are also cleaned, right? So why do
we need to document it additionally?

> before a partial DMA_FROM_DEVICE, the question between clean
> or flush becomes moot as well.
>
> > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
> >                 break;
> >         case DMA_FROM_DEVICE:
> >         case DMA_BIDIRECTIONAL:
> >                 /* I'm not sure all drivers have guaranteed cacheline
> > alignment. If not, this inval would cause problems */
> > -               ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
> > +               ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size);
> >                 break;
>
> This is my original patch, and I would not mix it with the other
> change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in
> is that both flush and inval would be wrong if you get simultaneous
> writes from device and cpu to the same cache line, so there is
> no way to win this. Using inval instead of flush would at least
> work if the CPU data in the cacheline is read-only from the CPU,
> so that seems better than something that is always wrong.
If CPU data in the cacheline is read-only, the cacheline would never
be dirty. Yes, It's always safe.
Okay, I agree we must keep cache-line-aligned. I comment it here, just
worry some dirty drivers couldn't work with the "invalid mechanism"
because of the CPU data corruption, and device data in the cacheline
is useless.

>
> The documented API is that sharing the cache line is not allowed
> at all, so anything that would observe a difference between the
> two is also a bug. One idea that we have considered already is
> that we could overwrite the unused bits of the cacheline with
> poison values and/or mark them as invalid using KASAN for debugging
> purposes, to find drivers that already violate this.
>
>       Arnd



-- 
Best Regards
 Guo Ren