[PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

Fri Jan 29 05:53:20 EST 2021

On Fri, Jan 29, 2021 at 11:33 AM Russell King - ARM Linux admin
<linux at armlinux.org.uk> wrote:
>
> On Fri, Jan 29, 2021 at 11:26:38AM +0100, Arnd Bergmann wrote:
> > Another clarification, as there are actually two independent
> > points here:
> >
> > * if you can completely remove the readl() above and just write a
> >   hardcoded value into the register, or perhaps read the original
> >   value once at boot time, that is probably a win because it
> >   avoids one of the barriers in the beginning. The datasheet should
> >   tell you if there are any bits in the register that have to be
> >   preserved
> >
> > * Regarding the _relaxed() accessors, it's a lot harder to know
> >   whether that is safe, as you first have to show, in particular in case
> >   any of the accesses stop being guarded by the spinlock in that
> >   case, and whether there may be a case where you have to
> >   serialize the memory access against accesses that are still in the
> >   store queue or prefetched.
> >
> > Whether this matters at all depends mostly on the type of devices
> > you are driving on your SoC. If you have any high-speed network
> > interfaces that are unable to do cache coherent DMA, any extra
> > instruction here may impact the number of packets you can transfer,
> > but if all your high-speed devices are connected to a coherent
> > interconnect, I would just go with the obvious approach and use
> > the safe MMIO accessors everywhere.
>
> For L2 cache code, I would say the opposite, actually, because it is
> all too easy to get into a deadlock otherwise.
>
> If you implement the sync callback, that will be called from every
> non-relaxed accessor, which means if you need to take some kind of
> lock in the sync callback and elsewhere in the L2 cache code, you will
> definitely deadlock.

Fair enough. I mentioned the sync callback as the reason for
using the relaxed accessor in l2x0 in my first reply. Clearly if
there was a sync callback here, it would immediately deadlock
when calling back into sync() from readl()/writel().

> It is safer to put explicit barriers where it is necessary.
>
> Also remember that the barrier in readl() etc is _after_ the read, not
> before, and the barrier in writel() is _before_ the write, not after.
> The point is to ensure that DMA memory accesses are properly ordered
> with the IO-accessing instructions.
>
> So, using readl_relaxed() with a read-modify-write is entirely sensible
> provided you do not access DMA memory inbetween.

The part I was not sure about is what happens when you have
a store to memory immediately before flushing the cache, and there
are no barriers inbetween. Is there a possibility for the mmio store to
cause the cache to be flushed before the prior memory store has
made it into the cache? My guess would be that this cannot happen,
but I'm not sure. If the code gets changed to raw_writel(), I think this
should be documented next to the actual raw_writel(), explaining either
the presence of the absence of such a barrier.

      Arnd