[PATCH 3/3] coresight etm4x: Add 32-bit read/write option to split 64-bit words

Al Grant Al.Grant at arm.com
Wed Mar 8 02:06:46 PST 2023


> Hi Steve,
> 
> On Sun, Mar 05, 2023 at 10:54:08PM -0700, Steve Clevenger wrote:
> > An Ampere Computing design decision to not support 64-bit read/write
> > access in the ETMv4.6 implementation.
> >
> > The Ampere work around is to split ETMv4.6 64-bit register access into
> > 2 ea. 32-bit read/write accesses.
> > AC03_DEBUG_10 is described in the AmpereOne Developer Errata document:
> >
> > https://solutions.amperecomputing.com/customer-connect/products/Ampere
> > One-device-documentation
> >
> > Signed-off-by: Steve Clevenger <scclevenger at os.amperecomputing.com>
> > ---
> >  drivers/hwtracing/coresight/coresight-etm4x.h | 58
> > ++++++++++++++-----
> >  1 file changed, 44 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h
> > b/drivers/hwtracing/coresight/coresight-etm4x.h
> > index 434f4e95ee17..17457ec71876 100644
> > --- a/drivers/hwtracing/coresight/coresight-etm4x.h
> > +++ b/drivers/hwtracing/coresight/coresight-etm4x.h
> > @@ -539,11 +539,6 @@
> >  		 readl_relaxed((csa)->base + (offset)) :		\
> >  		 read_etm4x_sysreg_offset((offset), false)))
> >
> > -#define etm4x_relaxed_read64(csa, offset)				\
> > -	((u64)((csa)->io_mem ?						\
> > -		 readq_relaxed((csa)->base + (offset)) :		\
> > -		 read_etm4x_sysreg_offset((offset), true)))
> > -
> >  #define etm4x_read32(csa, offset)					\
> >  	({								\
> >  		u32 __val = etm4x_relaxed_read32((csa), (offset));	\
> > @@ -567,15 +562,6 @@
> >  						  false);		\
> >  	} while (0)
> >
> > -#define etm4x_relaxed_write64(csa, val, offset)
> 	\
> > -	do {								\
> > -		if ((csa)->io_mem)					\
> > -			writeq_relaxed((val), (csa)->base + (offset));	\
> > -		else							\
> > -			write_etm4x_sysreg_offset((val), (offset),	\
> > -						  true);		\
> > -	} while (0)
> > -
> >  #define etm4x_write32(csa, val, offset)
> 	\
> >  	do {								\
> >  		__io_bw();						\
> > @@ -1091,6 +1077,50 @@ void etm4_config_trace_mode(struct
> etmv4_config
> > *config);
> >  u64 etm4x_sysreg_read(u32 offset, bool _relaxed, bool _64bit);  void
> > etm4x_sysreg_write(u64 val, u32 offset, bool _relaxed, bool _64bit);
> >
> > +/* 64-bit aligned to convert 64-bit access to 2 ea. 32-bit access */
> > +#pragma pack(push, 8)
> > +
> > +struct etm_quad_split {
> > +	u32 lsw;
> > +	u32 msw;
> > +};
> > +
> > +#pragma pack(pop)
> > +
> > +static inline u64 etm4x_relaxed_read64(struct csdev_access *csa,
> > +unsigned int offset) {
> > +	if (csa->io_mem) {
> > +		if (csa->no_quad_mmio) {
> > +			/* split 64-bit reads into 2 consecutive 32-bit reads */
> > +			struct etm_quad_split container;
> > +
> > +			container.lsw = etm4x_read32(csa, offset);
> > +			container.msw = etm4x_read32(csa, offset +
> sizeof(u32));
> > +
> > +			return *(u64 *) &container;
> 
> To be honest, I am not familiar with this part, just want to remind two things.
> 
> Firstly, here you could use hi_lo_readq_relaxed(), seems to me,
> hi_lo_readq_relaxed() does the same thing with above section and you don't
> need to define the structure etm_quad_split().
> 
> Secondly, IIUC, a main problem with splitting 64-bit access into two 32-bit
> accesses is breaking atomicity.  If here have race condition between reading and
> writing 64-bit registers, we need to consider to use spinlock for register
> accessing.

That shouldn't be an issue. 32-bit memory-mapped interface is 
normal for ETMs. The peripheral bus interface to ETMs (debug APB)
is maximum 32-bit. The ETM architecture defines that registers are
only single-copy atomic up to 32-bit. This is generally true of
CoreSight programming registers.

What typically happens to a 64-bit read/write from a CPU is that it
is split into two 32-bit accesses by a downsizing bridge on the path
from the main system interconnect to the debug APB (see 4.3.5 
in the ETM architecture spec). In this case, it sounds like there is no
downsizing bridge. But because there should be no existing code
relying on 64-bit atomic access to ETM registers, it should be safe
to split the accesses in software without worrying about atomicity.

Overall the ETM configuration involves complicated interactions
between multiple registers, so if you've got one CPU reading out the
configuration while another CPU is writing it, you've likely got much
bigger problems than 64-bit atomicity.

Al


> 
> I'd like to leave the second issue to Suzuki/Mike/James for comfirmation in case
> I introduced complexity.
> 
> > +		} else
> > +			return readq_relaxed(csa->base + offset);
> > +	} else
> > +		return read_etm4x_sysreg_offset(offset, true);
> 
> Here need to add brackets:
> 
>         } else {
>                 return read_etm4x_sysreg_offset(offset, true);
>         }
> 
> > +}
> > +
> > +static inline void etm4x_relaxed_write64(struct csdev_access *csa,
> > +u64 quad, unsigned int offset) {
> > +	if (csa->io_mem) {
> > +		/* split 64-bit writes into 2 consecutive 32-bit writes */
> > +		if (csa->no_quad_mmio) {
> > +			struct etm_quad_split container;
> > +
> > +			*(u64 *) &container = quad;
> > +
> > +			etm4x_relaxed_write32(csa, container.lsw, offset);
> > +			etm4x_relaxed_write32(csa, container.msw, offset +
> sizeof(u32));
> > +		} else
> > +			writeq_relaxed(quad, csa->base + offset);
> > +	} else
> > +		write_etm4x_sysreg_offset(quad, offset, true);		\
> 
> Ditto.  Please drop the character '\' at the end of the line.
> 
> Thanks,
> Leo
> 
> P.s. I have a ADLink AVA platform (Ampere Altra SoC with 32 CPUs), I am glad to
> give a test if you can confirm this patch set can apply on it (and please clarify if
> there have any prerequisite for firmwares).
> 
> > +}
> > +
> >  static inline bool etm4x_is_ete(struct etmv4_drvdata *drvdata)  {
> >  	return drvdata->arch >= ETM_ARCH_ETE;
> > --
> > 2.25.1
> >
> _______________________________________________
> CoreSight mailing list -- coresight at lists.linaro.org To unsubscribe send an email
> to coresight-leave at lists.linaro.org



More information about the linux-arm-kernel mailing list