[PATCH] perf/arm-cmn: Reduce stack usage during discovery

Ilkka Koskinen ilkka at os.amperecomputing.com
Sat Jun 28 18:54:35 PDT 2025


On Fri, 27 Jun 2025, Robin Murphy wrote:
> Arnd reports that Clang's aggressive inlining of arm_cmn_discover() can
> lead to stack frame size warnings, and while we could simply prevent
> such inlining to hide the issue, it seems more productive to actually
> heed the warning and do something about the overall stack footprint.
> The xp_region array is already rather large, and CMN_MAX_XPS might only
> grow larger in future, however it only serves as a convenience to save
> repeating the first level's worth of register reads in the second pass
> of discovery. There's no performance concern here, and it only takes a
> small tweak to the flow to re-extract the offsets instead of stashing
> them, so let's just do that and save several hundred bytes of stack.
>
> Reported-by: Arnd Bergmann <arnd at kernel.org>
> Signed-off-by: Robin Murphy <robin.murphy at arm.com>


Sounds reasonable, the patch looks good to me, and no changes in the map 
file, just as expected. Thus,

 	Reviewed-and-tested-by: Ilkka Koskinen <ilkka at os.amperecomputing.com>

> ---
> drivers/perf/arm-cmn.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index 031d45d0fe3d..02ffc8d09328 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -2245,12 +2245,11 @@ static enum cmn_node_type arm_cmn_subtype(enum cmn_node_type type)
>
> static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
> {
> -	void __iomem *cfg_region;
> +	void __iomem *cfg_region, __iomem *xp_region;
> 	struct arm_cmn_node cfg, *dn;
> 	struct arm_cmn_dtm *dtm;
> 	enum cmn_part part;
> 	u16 child_count, child_poff;
> -	u32 xp_offset[CMN_MAX_XPS];
> 	u64 reg;
> 	int i, j;
> 	size_t sz;
> @@ -2302,11 +2301,12 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
> 	cmn->num_dns = cmn->num_xps;
>
> 	/* Pass 1: visit the XPs, enumerate their children */
> +	cfg_region += child_poff;
> 	for (i = 0; i < cmn->num_xps; i++) {
> -		reg = readq_relaxed(cfg_region + child_poff + i * 8);
> -		xp_offset[i] = reg & CMN_CHILD_NODE_ADDR;
> +		reg = readq_relaxed(cfg_region + i * 8);
> +		xp_region = cmn->base + (reg & CMN_CHILD_NODE_ADDR);
>
> -		reg = readq_relaxed(cmn->base + xp_offset[i] + CMN_CHILD_INFO);
> +		reg = readq_relaxed(xp_region + CMN_CHILD_INFO);
> 		cmn->num_dns += FIELD_GET(CMN_CI_CHILD_COUNT, reg);
> 	}
>
> @@ -2332,11 +2332,12 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
> 	cmn->dns = dn;
> 	cmn->dtms = dtm;
> 	for (i = 0; i < cmn->num_xps; i++) {
> -		void __iomem *xp_region = cmn->base + xp_offset[i];
> 		struct arm_cmn_node *xp = dn++;
> 		unsigned int xp_ports = 0;
>
> -		arm_cmn_init_node_info(cmn, xp_offset[i], xp);
> +		reg = readq_relaxed(cfg_region + i * 8);
> +		xp_region = cmn->base + (reg & CMN_CHILD_NODE_ADDR);
> +		arm_cmn_init_node_info(cmn, reg & CMN_CHILD_NODE_ADDR, xp);
> 		/*
> 		 * Thanks to the order in which XP logical IDs seem to be
> 		 * assigned, we can handily infer the mesh X dimension by
> -- 
> 2.39.2.101.g768bb238c484.dirty
>
>



More information about the linux-arm-kernel mailing list