[PATCH V13 - RESEND 02/10] arm64/perf: Add BRBE registers and fields

Anshuman Khandual anshuman.khandual at arm.com
Mon Jul 31 05:19:19 PDT 2023



On 7/31/23 14:36, Mark Rutland wrote:
> On Mon, Jul 31, 2023 at 08:03:21AM +0530, Anshuman Khandual wrote:
>>
>>
>> On 7/28/23 22:22, James Clark wrote:
>>>
>>>
>>> On 28/07/2023 17:20, Will Deacon wrote:
>>>> On Tue, Jul 11, 2023 at 01:54:47PM +0530, Anshuman Khandual wrote:
>>>>> This adds BRBE related register definitions and various other related field
>>>>> macros there in. These will be used subsequently in a BRBE driver which is
>>>>> being added later on.
>>>>>
>>>>> Cc: Catalin Marinas <catalin.marinas at arm.com>
>>>>> Cc: Will Deacon <will at kernel.org>
>>>>> Cc: Marc Zyngier <maz at kernel.org>
>>>>> Cc: Mark Rutland <mark.rutland at arm.com>
>>>>> Cc: linux-arm-kernel at lists.infradead.org
>>>>> Cc: linux-kernel at vger.kernel.org
>>>>> Tested-by: James Clark <james.clark at arm.com>
>>>>> Reviewed-by: Mark Brown <broonie at kernel.org>
>>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual at arm.com>
>>>>> ---
>>>>>  arch/arm64/include/asm/sysreg.h | 103 +++++++++++++++++++++
>>>>>  arch/arm64/tools/sysreg         | 158 ++++++++++++++++++++++++++++++++
>>>>>  2 files changed, 261 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
>>>>> index b481935e9314..f95e30c13c8b 100644
>>>>> --- a/arch/arm64/include/asm/sysreg.h
>>>>> +++ b/arch/arm64/include/asm/sysreg.h
>>>>> @@ -163,6 +163,109 @@
>>>>>  #define SYS_DBGDTRTX_EL0		sys_reg(2, 3, 0, 5, 0)
>>>>>  #define SYS_DBGVCR32_EL2		sys_reg(2, 4, 0, 7, 0)
>>>>>  
>>>>> +#define __SYS_BRBINFO(n)		sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 0))
>>>>> +#define __SYS_BRBSRC(n)			sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 1))
>>>>> +#define __SYS_BRBTGT(n)			sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 2))
>>>>
>>>> It's that time on a Friday but... aren't these macros busted? I think you
>>>> need brackets before adding the offset, otherwise wouldn't, for example,
>>>> target registers 0-15 all access info register 0 and __SYS_BRBTGT(16) would
>>>> then start accessing source register 0?
>>>>
>>>> I'm surprised that the compiler doesn't warn about this, but even more
>>>> surprised that you managed to test this.
>>>>
>>>> Please tell me I'm wrong!
>>>>
>>>> Will
>>>
>>> No I think you are right, it is wrong. Luckily there is already an
>>> extraneous bracket so you you can fix it by moving one a place down:
>>>
>>>   sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10) >> 2) + 2))
>>>
>>> It's interesting because the test [1] is doing quite a bit and looking
>>> at the branch info, and that src and targets match up to function names.
>>> I also manually looked at the branch buffers and didn't see anything
>>> obviously wrong like things that looked like branch infos in the source
>>> or target fields. Will have to take another look to see if it would be
>>> possible for the test to catch this.
>>>
>>> James
>>>
>>> [1]:
>>> https://gitlab.arm.com/linux-arm/linux-jc/-/commit/3a7ddce70c2daadb63fcc511de0a89055ca48b32
>>
>> ((((n) & 0x10)) >> 2 + 2) ---> ((((n) & 0x10) >> 2) + 2)
>>
>> The additional brackets are useful in explicitly telling the compiler but
>> what it the compiler is just doing the right thing implicitly i.e computing
>> the shifting operation before doing the offset addition.
> 
> No; that is not correct. In c, '+' has higher precedence than '>>'.
> 
> For 'a >> b + c' the compiler *must* treat that as 'a >> (b + c)', and not as
> '(a >> b) + c'
> 
> That's trivial to test:
> 
> | [mark at gravadlaks:~]% cat shiftadd.c 
> | #include <stdio.h>
> | 
> | unsigned long logshiftadd(unsigned long a,
> |                           unsigned long b,
> |                           unsigned long c)
> | {
> |         printf("%ld >> %ld + %ld is %ld\n",
> |                a, b, c, a >> b + c);
> | }
> | 
> | int main(int argc, char *argv)
> | {
> |         logshiftadd(0, 0, 0);
> |         logshiftadd(0, 0, 1);
> |         logshiftadd(0, 0, 2);
> |         printf("\n");
> |         logshiftadd(1024, 0, 0);
> |         logshiftadd(1024, 0, 1);
> |         logshiftadd(1024, 0, 2);
> |         printf("\n");
> |         logshiftadd(1024, 2, 0);
> |         logshiftadd(1024, 2, 1);
> |         logshiftadd(1024, 2, 2);
> | 
> |         return 0;
> | }
> | [mark at gravadlaks:~]% gcc shiftadd.c -o shiftadd
> | [mark at gravadlaks:~]% ./shiftadd 
> | 0 >> 0 + 0 is 0
> | 0 >> 0 + 1 is 0
> | 0 >> 0 + 2 is 0
> | 
> | 1024 >> 0 + 0 is 1024
> | 1024 >> 0 + 1 is 512
> | 1024 >> 0 + 2 is 256
> | 
> | 1024 >> 2 + 0 is 256
> | 1024 >> 2 + 1 is 128
> | 1024 >> 2 + 2 is 64

Understood.

> 
>> During testing, all > those captured branch records looked alright.
> 
> I think we clearly need better testing here
I am still thinking - how could this might have been missed. Could it be
possible that these wrongly computed higher indices were getting folded
back/rolled over into the same legal range indices for a given bank. If
they did, branch record processing would have still captured almost all
of them, may be in an incorrect order. Branch order does matter for the
stitched mechanism.



More information about the linux-arm-kernel mailing list