[PATCH v6 2/2] coresight: cti: Add Qualcomm extended CTI support
Mike Leach
mike.leach at linaro.org
Mon Dec 8 06:47:21 PST 2025
Hi Leo
On Fri, 5 Dec 2025 at 10:04, Leo Yan <leo.yan at arm.com> wrote:
>
> Hi Mike,
>
> On Thu, Dec 04, 2025 at 04:17:35PM +0000, Mike Leach wrote:
>
> [...]
>
> > The tables in the patch are
> >
> > [reg_type_array_index] = offset_address;
> >
> > e.g.
> >
> > [INDEX_CTIINTACK] = QCOM_CTIINTACK
> >
> > which resolves to
> >
> > [1] = 0x020
> >
> > where index is constant for a given register type,
> >
> > As far as I can tell what you have suggested above is a table that is
> >
> > [std_addr_offset] = qcom_addr_offset;
> >
> > e.g.
> >
> > [CTIINTACK] = QCOM_CTIINTACK,
> >
> > which resolves to
> >
> > [0x10] = 0x020
> >
> > which does not appear to work correctly?
Sorry - what I mean here is the contiguous array that appears to be in
the source, does not match the reality when compiled into memory - not
that it doesn't work programmatically.
> >
> > The registers are sparsely spread across the memory map, so a simple
> > mapping does not work, even if we divide the original offset by 4 to
> > create a register number.
>
> This should work. Though the array is not filled for each item, but
> it will return back 0x20 when we access array[0x10], I don't see
> problem here.
>
> > The largest standard offset we have is ITTRIGIN = 0xEF8, so assuming
> > the compiler allows us to sparselly populate the array (which I think
> > it does, along with some padding), we end up with an array of at least
> > 0xEF8 elements, rather then the indexed 21?
>
> I tested locally and did not see the GCC complaint for this approach.
> And this is a global structure with about 16KiB (~4K items x
Which is precisely the issue - why use 16k bytes of space when a pair
of indexed tables will use 21 x 32bit locations per table -> 168 bytes
- 100x smaller!
This space matters little to high end server systems but is much more
important in smaller embedded systems.
Moreover the table + inline helper is more efficient at extracting the
correct offset value. The helper is a simple de-reference - whereas
the helper functions you suggest require the code to make the
comparison at every register access.
The "if qcom ..." may be contained in one place in the source code,
but is called and executed for every access.
Why add inefficiencies, either in footprint or execution?
Regards
Mike
> sizeof(u32)), we don't need to worry about scaling issue as it is
> shared by device instances.
>
> If you dislike this way, then a static function also can fulfill the
> same task, something like:
>
> static noinline u32 cti_qcom_reg_off(u32 offset)
> {
> switch (offset) {
> CTIINTACK: return QCOM_CTIINTACK;
> CTIAPPSET: return QCOM_CTIAPPSET;
> ...
> default:
> WARN(1, "Unknown offset=%u\n", offset);
> return 0;
> }
>
> /* Should not run here, just for compiling */
> return 0;
> }
>
> Thanks,
> Leo
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
More information about the linux-arm-kernel
mailing list