[PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller
Leizhen (ThunderTown)
thunder.leizhen at huawei.com
Fri Jan 29 08:54:28 EST 2021
On 2021/1/29 18:26, Arnd Bergmann wrote:
> On Fri, Jan 29, 2021 at 9:16 AM Arnd Bergmann <arnd at kernel.org> wrote:
>> On Fri, Jan 29, 2021 at 8:23 AM Leizhen (ThunderTown)
>> <thunder.leizhen at huawei.com> wrote:
>>> On 2021/1/28 22:24, Arnd Bergmann wrote:
>>>> On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei <thunder.leizhen at huawei.com> wrote:
>>>>> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
>>>>> +
>>>>> +static void l3cache_maint_common(u32 range, u32 op_type)
>>>>> +{
>>>>> + u32 reg;
>>>>> +
>>>>> + reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
>>>>> + reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
>>>>> + reg |= range | op_type;
>>>>> + reg |= L3_MAINT_STATUS_START;
>>>>> + writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
>>>>
>>>> Are there contents of L3_MAINT_CTRL that need to be preserved
>>>> across calls and can not be inferred? A 'readl()' is often expensive,
>>>> so it might be more efficient if you can avoid that.
>>>
>>> Right, this readl() can be replaced with readl_relaxed(). Thanks.
>>>
>>> I'll check and correct the readl() and writel() in other places.
>>
>> What I meant is that if you want to replace them, you should provide
>> performance numbers that show how much difference this makes
>> and add comments in the source code explaining how you proved that
>> the _relaxed() version is actually correct.
>
> Another clarification, as there are actually two independent
> points here:
>
> * if you can completely remove the readl() above and just write a
> hardcoded value into the register, or perhaps read the original
> value once at boot time, that is probably a win because it
> avoids one of the barriers in the beginning. The datasheet should
> tell you if there are any bits in the register that have to be
> preserved
Code coupling will become very strong.
>
> * Regarding the _relaxed() accessors, it's a lot harder to know
> whether that is safe, as you first have to show, in particular in case
> any of the accesses stop being guarded by the spinlock in that
> case, and whether there may be a case where you have to
> serialize the memory access against accesses that are still in the
> store queue or prefetched.
>
> Whether this matters at all depends mostly on the type of devices
> you are driving on your SoC. If you have any high-speed network
> interfaces that are unable to do cache coherent DMA, any extra
> instruction here may impact the number of packets you can transfer,
> but if all your high-speed devices are connected to a coherent
> interconnect, I would just go with the obvious approach and use
> the safe MMIO accessors everywhere.
In fact, this driver has been running on an earlier version for several years
and has not received any feedback about the performance issue. So I didn't
try to optimize it when I first sent these patches. I had to reconsider it
until you noticed it.
How about keeping it unchanged for the moment? It'll take a lot of time and
energy to retest.
>
> Arnd
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
>
More information about the linux-arm-kernel
mailing list