[PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

Fri Jan 29 21:51:12 EST 2021

On 2021/1/29 18:33, Russell King - ARM Linux admin wrote:
> On Fri, Jan 29, 2021 at 11:26:38AM +0100, Arnd Bergmann wrote:
>> Another clarification, as there are actually two independent
>> points here:
>>
>> * if you can completely remove the readl() above and just write a
>>   hardcoded value into the register, or perhaps read the original
>>   value once at boot time, that is probably a win because it
>>   avoids one of the barriers in the beginning. The datasheet should
>>   tell you if there are any bits in the register that have to be
>>   preserved
>>
>> * Regarding the _relaxed() accessors, it's a lot harder to know
>>   whether that is safe, as you first have to show, in particular in case
>>   any of the accesses stop being guarded by the spinlock in that
>>   case, and whether there may be a case where you have to
>>   serialize the memory access against accesses that are still in the
>>   store queue or prefetched.
>>
>> Whether this matters at all depends mostly on the type of devices
>> you are driving on your SoC. If you have any high-speed network
>> interfaces that are unable to do cache coherent DMA, any extra
>> instruction here may impact the number of packets you can transfer,
>> but if all your high-speed devices are connected to a coherent
>> interconnect, I would just go with the obvious approach and use
>> the safe MMIO accessors everywhere.
> 
> For L2 cache code, I would say the opposite, actually, because it is
> all too easy to get into a deadlock otherwise.
> 
> If you implement the sync callback, that will be called from every
> non-relaxed accessor, which means if you need to take some kind of
> lock in the sync callback and elsewhere in the L2 cache code, you will
> definitely deadlock.
> 
> It is safer to put explicit barriers where it is necessary.
> 
> Also remember that the barrier in readl() etc is _after_ the read, not
> before, and the barrier in writel() is _before_ the write, not after.
> The point is to ensure that DMA memory accesses are properly ordered
> with the IO-accessing instructions.

Yes, I known it. writel() must be used for the write operations that control
"start/stop" or "enable/disable" function, to ensure that the data of previous
write operations reaches the target. I've met this kind of problem before.

> 
> So, using readl_relaxed() with a read-modify-write is entirely sensible
> provided you do not access DMA memory inbetween.

Actually, I don't think this register is that complicated. I copied the code
back below. All the bits of L3_MAINT_CTRL are not affected by DMA access operations.
The software change the "range | op_type" to specify the operation type and scope,
the set the bit "L3_MAINT_STATUS_START" to start the operation. Then wait for that
bit to change from 1 to 0 by hardware.

+	reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
+	reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
+	reg |= range | op_type;
+	reg |= L3_MAINT_STATUS_START;
+	writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
+
+	/* Wait until the hardware maintenance operation is complete. */
+	do {
+		cpu_relax();
+		reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
+	} while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END);

>