[PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

Fri Jan 29 05:26:38 EST 2021

On Fri, Jan 29, 2021 at 9:16 AM Arnd Bergmann <arnd at kernel.org> wrote:
> On Fri, Jan 29, 2021 at 8:23 AM Leizhen (ThunderTown)
> <thunder.leizhen at huawei.com> wrote:
> > On 2021/1/28 22:24, Arnd Bergmann wrote:
> > > On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei <thunder.leizhen at huawei.com> wrote:
> > >> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
> > >> +
> > >> +static void l3cache_maint_common(u32 range, u32 op_type)
> > >> +{
> > >> +       u32 reg;
> > >> +
> > >> +       reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
> > >> +       reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
> > >> +       reg |= range | op_type;
> > >> +       reg |= L3_MAINT_STATUS_START;
> > >> +       writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
> > >
> > > Are there contents of L3_MAINT_CTRL that need to be preserved
> > > across calls and can not be inferred? A 'readl()' is often expensive,
> > > so it might be more efficient if you can avoid that.
> >
> > Right, this readl() can be replaced with readl_relaxed(). Thanks.
> >
> > I'll check and correct the readl() and writel() in other places.
>
> What I meant is that if you want to replace them, you should provide
> performance numbers that show how much difference this makes
> and add comments in the source code explaining how you proved that
> the _relaxed() version is actually correct.

Another clarification, as there are actually two independent
points here:

* if you can completely remove the readl() above and just write a
  hardcoded value into the register, or perhaps read the original
  value once at boot time, that is probably a win because it
  avoids one of the barriers in the beginning. The datasheet should
  tell you if there are any bits in the register that have to be
  preserved

* Regarding the _relaxed() accessors, it's a lot harder to know
  whether that is safe, as you first have to show, in particular in case
  any of the accesses stop being guarded by the spinlock in that
  case, and whether there may be a case where you have to
  serialize the memory access against accesses that are still in the
  store queue or prefetched.

Whether this matters at all depends mostly on the type of devices
you are driving on your SoC. If you have any high-speed network
interfaces that are unable to do cache coherent DMA, any extra
instruction here may impact the number of packets you can transfer,
but if all your high-speed devices are connected to a coherent
interconnect, I would just go with the obvious approach and use
the safe MMIO accessors everywhere.

       Arnd