[PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions

Leonardo Bras leobras at redhat.com
Thu Jan 4 22:59:44 PST 2024


On Thu, Jan 04, 2024 at 09:18:15PM -0800, Boqun Feng wrote:
> On Fri, Jan 05, 2024 at 01:45:42AM -0300, Leonardo Bras wrote:
> [...]
> > > > According to gcc.gnu.org:
> > > > 
> > > > ---
> > > > "memory" [clobber]:
> > > > 
> > > >     The "memory" clobber tells the compiler that the assembly code 
> > > >     performs memory reads or writes to items other than those listed in 
> > > >     the input and output operands (for example, accessing the memory 
> > > >     pointed to by one of the input parameters). To ensure memory contains 
> > > 
> > > Note here it says "other than those listed in the input and output
> > > operands", and in the above asm block, the memory pointed by "__ptr" is
> > > already marked as read-and-write by the asm block via "+A" (*__ptr), so
> > > the compiler knows the asm block may modify the memory pointed by
> > > "__ptr", therefore in _relaxed() case, "memory" clobber can be avoided.
> > 
> > Thanks for pointing that out! 
> > That helped me improve my understanding on constraints for asm operands :)
> > (I ended up getting even more info from the gcc manual)
> > 
> > So "+A" constraints means the operand will get read/write and it's an 
> > address stored into a register.
> > 
> > > 
> > > Here is an example showing the difference, considering the follow case:
> > > 
> > > 	this_val = *this;
> > > 	that_val = *that;
> > > 	xchg_relaxed(this, 1);
> > > 	reread_this = *this;
> > > 
> > > by the semantics of _relaxed, compilers can optimize the above into
> > > 
> > > 	this_val = *this;
> > > 	xchg_relaxed(this, 1);
> > > 	that_val = *that;
> > > 	reread_this = *this;
> > > 
> > 
> > Seems correct, since there is no barrier().
> > 
> > > but the "memory" clobber in the xchg_relexed() will provide this.
> > 
> > By 'this' here you mean the barrier? I mean, IIUC "memory" clobber will 
> > avoid the above optimization, right?
> > 
> 
> Right, seems I mis-typed "provide" (I meant "prevent")
> 
> > > Needless to say the '"+A" (*__ptr)' prevents compiler from the following
> > > optimization:
> > > 
> > > 	this_val = *this;
> > > 	that_val = *that;
> > > 	xchg_relaxed(this, 1);
> > > 	reread_this = this_val;
> > > 
> > > since the compiler knows the asm block will read and write *this.
> >  
> > Right, the compiler knows that address will be wrote by the asm block, and 
> > so it reloads the value instead of re-using the old one.
> > 
> 
> Correct.
> 
> > 
> > A question, though:
> 
> Good question ;-)
> 
> > Do we need the "memory" clobber in any other xchg / cmpxchg asm?
> 
> The "memory" clobber is needed for others, see below:
> 
> > I mean, usually the only write to memory will happen in the *__ptr, which 
> > should be safe by "+A".
> > 
> > I understand that since the others are not "relaxed" they will need to 
> > have a barrier, but is not the compiler supposed to understand the barrier 
> > instruction and avoid compiler reordering / optimizations across given 
> > instruction ?  
> > 
> 
> The barrier semantics (ACQUIRE/RELEASE/FULL) is provided by the combined
> effort of both 1) preventing compiler optimization by "memory" clobber
> and 2) preventing CPU/memory reordering by arch-specific instructions.
> 
> In other words, an asm block contains a hardware barrier instruction
> should always have the "memory" clobber, otherwise, there are
> possiblities that compilers reorder the asm block therefore break the
> ordering provided by the hardware instructions.

Oh, I see.
So this means the compiler does not check for memory barrier instructions 
before reordering loads/stores. Right?

Meaning it needs a way to signal a compiler barrier, on top of the barrier 
instructions. 

Thanks for helping me improve my understanding of this!
Leo

> 
> Regards,
> Boqun
> 
> > 
> > Thanks!
> > Leo
> > 
> > > Regards,
> > > Boqun
> > > 
> > > >     correct values, GCC may need to flush specific register values to 
> > > >     memory before executing the asm. Further, the compiler does not assume 
> > > >     that any values read from memory before an asm remain unchanged after 
> > > >     that asm ; it reloads them as needed. Using the "memory" clobber 
> > > >     effectively forms a read/write memory barrier for the compiler.
> > > > 
> > > >     Note that this clobber does not prevent the processor from doing 
> > > >     speculative reads past the asm statement. To prevent that, you need 
> > > >     processor-specific fence instructions.
> > > > ---
> > > > 
> > > > IIUC above text says that having memory accesses to *__ptr would require 
> > > > above asm to have the "memory" clobber, so memory accesses don't get 
> > > > reordered by the compiler. 
> > > > 
> > > > By above affirmation, all asm in this file should have the "memory" 
> > > > clobber, since all atomic operations will change memory pointed by an input 
> > > > ptr. Is that correct?
> > > > 
> > > > Thanks!
> > > > Leo
> > > > 
> > > > 
> > > > > 
> > > > > Regards,
> > > > > Boqun
> > > > > 
> > > > > > -		break;							\
> > > > > > -	case 8:								\
> > > > > > -		__asm__ __volatile__ (					\
> > > > > > -			"	amoswap.d %0, %2, %1\n"			\
> > > > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > > > -			: "r" (__new)					\
> > > > > > -			: "memory");					\
> > > > > > -		break;							\
> > > > > > -	default:							\
> > > > > > -		BUILD_BUG();						\
> > > > > > -	}								\
> > > > > > -	__ret;								\
> > > > > > -})
> > > > > > -
> > > > > > -#define arch_xchg_relaxed(ptr, x)					\
> > > > > > -({									\
> > > > > > -	__typeof__(*(ptr)) _x_ = (x);					\
> > > > > > -	(__typeof__(*(ptr))) __xchg_relaxed((ptr),			\
> > > > > > -					    _x_, sizeof(*(ptr)));	\
> > > > > > +	__asm__ __volatile__ (						\
> > > > > > +		prepend							\
> > > > > > +		"	amoswap" sfx " %0, %2, %1\n"			\
> > > > > > +		append							\
> > > > > > +		: "=r" (r), "+A" (*(p))					\
> > > > > > +		: "r" (n)						\
> > > > > > +		: "memory");						\
> > > > > >  })
> > > > > >  
> > > > > > -#define __xchg_acquire(ptr, new, size)					\
> > > > > > +#define _arch_xchg(ptr, new, sfx, prepend, append)			\
> > > > > >  ({									\
> > > > > >  	__typeof__(ptr) __ptr = (ptr);					\
> > > > > > -	__typeof__(new) __new = (new);					\
> > > > > > -	__typeof__(*(ptr)) __ret;					\
> > > > > > -	switch (size) {							\
> > > > > > +	__typeof__(*(__ptr)) __new = (new);				\
> > > > > > +	__typeof__(*(__ptr)) __ret;					\
> > > > > > +	switch (sizeof(*__ptr)) {					\
> > > > > >  	case 4:								\
> > > > > > -		__asm__ __volatile__ (					\
> > > > > > -			"	amoswap.w %0, %2, %1\n"			\
> > > > > > -			RISCV_ACQUIRE_BARRIER				\
> > > > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > > > -			: "r" (__new)					\
> > > > > > -			: "memory");					\
> > > > > > +		__arch_xchg(".w" sfx, prepend, append,			\
> > > > > > +			      __ret, __ptr, __new);			\
> > > > > >  		break;							\
> > > > > >  	case 8:								\
> > > > > > -		__asm__ __volatile__ (					\
> > > > > > -			"	amoswap.d %0, %2, %1\n"			\
> > > > > > -			RISCV_ACQUIRE_BARRIER				\
> > > > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > > > -			: "r" (__new)					\
> > > > > > -			: "memory");					\
> > > > > > +		__arch_xchg(".d" sfx, prepend, append,			\
> > > > > > +			      __ret, __ptr, __new);			\
> > > > > >  		break;							\
> > > > > >  	default:							\
> > > > > >  		BUILD_BUG();						\
> > > > > >  	}								\
> > > > > > -	__ret;								\
> > > > > > +	(__typeof__(*(__ptr)))__ret;					\
> > > > > >  })
> > > > > >  
> > > > > > -#define arch_xchg_acquire(ptr, x)					\
> > > > > > -({									\
> > > > > > -	__typeof__(*(ptr)) _x_ = (x);					\
> > > > > > -	(__typeof__(*(ptr))) __xchg_acquire((ptr),			\
> > > > > > -					    _x_, sizeof(*(ptr)));	\
> > > > > > -})
> > > > > > +#define arch_xchg_relaxed(ptr, x)					\
> > > > > > +	_arch_xchg(ptr, x, "", "", "")
> > > > > >  
> > > > > > -#define __xchg_release(ptr, new, size)					\
> > > > > > -({									\
> > > > > > -	__typeof__(ptr) __ptr = (ptr);					\
> > > > > > -	__typeof__(new) __new = (new);					\
> > > > > > -	__typeof__(*(ptr)) __ret;					\
> > > > > > -	switch (size) {							\
> > > > > > -	case 4:								\
> > > > > > -		__asm__ __volatile__ (					\
> > > > > > -			RISCV_RELEASE_BARRIER				\
> > > > > > -			"	amoswap.w %0, %2, %1\n"			\
> > > > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > > > -			: "r" (__new)					\
> > > > > > -			: "memory");					\
> > > > > > -		break;							\
> > > > > > -	case 8:								\
> > > > > > -		__asm__ __volatile__ (					\
> > > > > > -			RISCV_RELEASE_BARRIER				\
> > > > > > -			"	amoswap.d %0, %2, %1\n"			\
> > > > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > > > -			: "r" (__new)					\
> > > > > > -			: "memory");					\
> > > > > > -		break;							\
> > > > > > -	default:							\
> > > > > > -		BUILD_BUG();						\
> > > > > > -	}								\
> > > > > > -	__ret;								\
> > > > > > -})
> > > > > > +#define arch_xchg_acquire(ptr, x)					\
> > > > > > +	_arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)
> > > > > >  
> > > > > >  #define arch_xchg_release(ptr, x)					\
> > > > > > -({									\
> > > > > > -	__typeof__(*(ptr)) _x_ = (x);					\
> > > > > > -	(__typeof__(*(ptr))) __xchg_release((ptr),			\
> > > > > > -					    _x_, sizeof(*(ptr)));	\
> > > > > > -})
> > > > > > -
> > > > > > -#define __arch_xchg(ptr, new, size)					\
> > > > > > -({									\
> > > > > > -	__typeof__(ptr) __ptr = (ptr);					\
> > > > > > -	__typeof__(new) __new = (new);					\
> > > > > > -	__typeof__(*(ptr)) __ret;					\
> > > > > > -	switch (size) {							\
> > > > > > -	case 4:								\
> > > > > > -		__asm__ __volatile__ (					\
> > > > > > -			"	amoswap.w.aqrl %0, %2, %1\n"		\
> > > > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > > > -			: "r" (__new)					\
> > > > > > -			: "memory");					\
> > > > > > -		break;							\
> > > > > > -	case 8:								\
> > > > > > -		__asm__ __volatile__ (					\
> > > > > > -			"	amoswap.d.aqrl %0, %2, %1\n"		\
> > > > > > -			: "=r" (__ret), "+A" (*__ptr)			\
> > > > > > -			: "r" (__new)					\
> > > > > > -			: "memory");					\
> > > > > > -		break;							\
> > > > > > -	default:							\
> > > > > > -		BUILD_BUG();						\
> > > > > > -	}								\
> > > > > > -	__ret;								\
> > > > > > -})
> > > > > > +	_arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")
> > > > > >  
> > > > > >  #define arch_xchg(ptr, x)						\
> > > > > > -({									\
> > > > > > -	__typeof__(*(ptr)) _x_ = (x);					\
> > > > > > -	(__typeof__(*(ptr))) __arch_xchg((ptr), _x_, sizeof(*(ptr)));	\
> > > > > > -})
> > > > > > +	_arch_xchg(ptr, x, ".aqrl", "", "")
> > > > > >  
> > > > > >  #define xchg32(ptr, x)							\
> > > > > >  ({									\
> > > > > > -- 
> > > > > > 2.43.0
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 




More information about the linux-riscv mailing list