[PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

Fri Jan 29 08:54:28 EST 2021

On 2021/1/29 18:26, Arnd Bergmann wrote:
> On Fri, Jan 29, 2021 at 9:16 AM Arnd Bergmann <arnd at kernel.org> wrote:
>> On Fri, Jan 29, 2021 at 8:23 AM Leizhen (ThunderTown)
>> <thunder.leizhen at huawei.com> wrote:
>>> On 2021/1/28 22:24, Arnd Bergmann wrote:
>>>> On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei <thunder.leizhen at huawei.com> wrote:
>>>>> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
>>>>> +
>>>>> +static void l3cache_maint_common(u32 range, u32 op_type)
>>>>> +{
>>>>> +       u32 reg;
>>>>> +
>>>>> +       reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
>>>>> +       reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
>>>>> +       reg |= range | op_type;
>>>>> +       reg |= L3_MAINT_STATUS_START;
>>>>> +       writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
>>>>
>>>> Are there contents of L3_MAINT_CTRL that need to be preserved
>>>> across calls and can not be inferred? A 'readl()' is often expensive,
>>>> so it might be more efficient if you can avoid that.
>>>
>>> Right, this readl() can be replaced with readl_relaxed(). Thanks.
>>>
>>> I'll check and correct the readl() and writel() in other places.
>>
>> What I meant is that if you want to replace them, you should provide
>> performance numbers that show how much difference this makes
>> and add comments in the source code explaining how you proved that
>> the _relaxed() version is actually correct.
> 
> Another clarification, as there are actually two independent
> points here:
> 
> * if you can completely remove the readl() above and just write a
>   hardcoded value into the register, or perhaps read the original
>   value once at boot time, that is probably a win because it
>   avoids one of the barriers in the beginning. The datasheet should
>   tell you if there are any bits in the register that have to be
>   preserved

Code coupling will become very strong.

> 
> * Regarding the _relaxed() accessors, it's a lot harder to know
>   whether that is safe, as you first have to show, in particular in case
>   any of the accesses stop being guarded by the spinlock in that
>   case, and whether there may be a case where you have to
>   serialize the memory access against accesses that are still in the
>   store queue or prefetched.
> 
> Whether this matters at all depends mostly on the type of devices
> you are driving on your SoC. If you have any high-speed network
> interfaces that are unable to do cache coherent DMA, any extra
> instruction here may impact the number of packets you can transfer,
> but if all your high-speed devices are connected to a coherent
> interconnect, I would just go with the obvious approach and use
> the safe MMIO accessors everywhere.

In fact, this driver has been running on an earlier version for several years
and has not received any feedback about the performance issue. So I didn't
try to optimize it when I first sent these patches. I had to reconsider it
until you noticed it.

How about keeping it unchanged for the moment? It'll take a lot of time and
energy to retest.

> 
>        Arnd
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
>