[PATCH] asm-generic: introduce io_stop_wc() and add implementation for ARM64

Xiongfeng Wang wangxiongfeng2 at huawei.com
Mon Dec 20 04:23:54 PST 2021


Hi, Catalin

On 2021/12/17 19:59, Catalin Marinas wrote:
> On Fri, Dec 17, 2021 at 04:56:11PM +0800, Xiongfeng Wang wrote:
>> For memory accesses with Normal-Non Cacheable attributes, the CPU may do
> 
> You may want to mention "arm64 Normal Non-Cacheable" as other
> architectures have a different meaning of NC.
> 
>> write combining. But in some situation, this is bad for the performance
>> because the prior access may wait too long just to be merged.
>>
>> We introduce io_stop_wc() to prevent the Normal-NC memory accesses before
>> this instruction to be merged with memory accesses after this
>> instruction.
>>
>> We add implementation for ARM64 using DGH instruction and provide NOP
>> implementation for other architectures.
>>
>> Signed-off-by: Xiongfeng Wang <wangxiongfeng2 at huawei.com>
>> ---
>>  Documentation/memory-barriers.txt |  9 +++++++++
>>  arch/arm64/include/asm/barrier.h  |  9 +++++++++
>>  include/asm-generic/barrier.h     | 11 +++++++++++
>>  3 files changed, 29 insertions(+)
>>
>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
>> index 7367ada13208..b868b51b1801 100644
>> --- a/Documentation/memory-barriers.txt
>> +++ b/Documentation/memory-barriers.txt
>> @@ -1950,6 +1950,15 @@ There are some more advanced barrier functions:
>>       For load from persistent memory, existing read memory barriers are sufficient
>>       to ensure read ordering.
>>  
>> + (*) io_stop_wc();
>> +
>> +     For memory accesses with Normal-Non Cacheable attributes (e.g. those
>> +     returned by ioremap_wc()), the CPU may do write combining. But in some
>> +     situation, this is bad for the performance because the prior access may
>> +     wait too long just to be merged. io_stop_wc() can be used to prevent
>> +     merging memory accesses with Normal-Non Cacheable attributes before this
>> +     instruction with any memory accesses appearing after this instruction.
> 
> I'm fine with the patch in general but the comment here and in
> asm-generic/barrier.h should avoid Normal Non-Cacheable as that's an
> arm-specific term. Looking at Documentation/driver-api/device-io.rst, we
> could simply say "write-combining". Something like:
> 
>      For memory accesses with write-combining attributes (e.g. those
>      returned by ioremap_wc()), the CPU may wait for prior accesses to
>      be merged with subsequent ones. io_stop_wc() can be used to prevent
>      the merging of write-combining memory accesses before this macro
>      with those after it when such wait has performance implications.
> 
> (feel free to rephrase it but avoid Normal NC here)

Thanks for your advice! I will send another version soon.

Thanks,
Xiongfeng

> 
>> +
>>  ===============================
>>  IMPLICIT KERNEL MEMORY BARRIERS
>>  ===============================
>> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>> index 1c5a00598458..62217be36217 100644
>> --- a/arch/arm64/include/asm/barrier.h
>> +++ b/arch/arm64/include/asm/barrier.h
>> @@ -26,6 +26,14 @@
>>  #define __tsb_csync()	asm volatile("hint #18" : : : "memory")
>>  #define csdb()		asm volatile("hint #20" : : : "memory")
>>  
>> +/*
>> + * Data Gathering Hint:
>> + * This instruction prevents merging memory accesses with Normal-NC or
>> + * Device-GRE attributes before the hint instruction with any memory accesses
>> + * appearing after the hint instruction.
>> + */
>> +#define dgh()		asm volatile("hint #6" : : : "memory")
> 
> This is fine, arm-specific code.
> 
>> +
>>  #ifdef CONFIG_ARM64_PSEUDO_NMI
>>  #define pmr_sync()						\
>>  	do {							\
>> @@ -46,6 +54,7 @@
>>  #define dma_rmb()	dmb(oshld)
>>  #define dma_wmb()	dmb(oshst)
>>  
>> +#define io_stop_wc()	dgh()
>>  
>>  #define tsb_csync()								\
>>  	do {									\
>> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
>> index 640f09479bdf..083be6d34cb9 100644
>> --- a/include/asm-generic/barrier.h
>> +++ b/include/asm-generic/barrier.h
>> @@ -251,5 +251,16 @@ do {									\
>>  #define pmem_wmb()	wmb()
>>  #endif
>>  
>> +/*
>> + * ioremap_wc() maps I/O memory as memory with Normal-Non Cacheable attributes.
>> + * The CPU may do write combining for this kind of memory access. io_stop_wc()
>> + * prevents merging memory accesses with Normal-Non Cacheable attributes
>> + * before this instruction with any memory accesses appearing after this
>> + * instruction.
> 
> Please change this as well along the lines of my comment above.
> 
>> + */
>> +#ifndef io_stop_wc
>> +#define io_stop_wc do { } while (0)
>> +#endif
>> +
>>  #endif /* !__ASSEMBLY__ */
>>  #endif /* __ASM_GENERIC_BARRIER_H */
> 
> Thanks.
> 



More information about the linux-arm-kernel mailing list