[PATCH] ARM: perf: ensure counter delta is limited to 32-bits

Will Deacon will.deacon at arm.com
Fri Jul 2 10:17:23 EDT 2010


> >  Performance counter stats for 'git status':
> >
> >          3650781413  cycles
> >           289950734  instructions             #      0.079 IPC
> >              144882  context-switches
> >               13677  page-faults
> >           473580406  branches
> >
> >        82.426290000  seconds time elapsed
> >
> >
> > Which looks insane to me. The IPC is appalling and we've taken
> > more branches than we've executed instructions!
> No, that certainly doesn't look right. Referring back your earlier email, when
> we do the cmpxchng, shouldn't it get sign extended then? An atomic64_t should
> be signed, but looking at the arm implementation, we're using u64's. Could
> this have anything to do with it? The generic implementation in lib/atomic64.c
> uses 'long long's.

It's a 64-bit cmpxchg and we're passing in 64-bit arguments so I don't
think any sign-extension should occur. The real issue is that we're taking
the difference between two u32 quantities (prev_raw_count and new_raw_count)
but using s64s, so when the delta is shifted right we treat the u32s as signed
when they're not.
 
> Providing that the sign extension does work, I've had a quick play with some
> uint64_t's and int64_t's in userspace and I think the algorithm is ok. Is the
> assumption of atomic64_t's being signed wrong?

I don't think the signedness of atomic64_t actually matters because everything
is handled in assembly anyway.

If you do something like:

#include <stdio.h>

typedef unsigned long u32;
typedef signed long long s64;
typedef unsigned long long u64;

int main(void)
{
	u64 x = 0xffffffff;
	u64 y = 0x0fffffff;
	s64 z = (x << 32) - (y << 32);
	z >>= 32;
	printf("0x%llx\n", z);
	return 0;
}

You'll get the sign-extended number printed out. Actually, a much neater
fix to this problem is to make delta unsigned and leave everything else
as-is (although I'm not a massive fan of all the 64-bit shifting).

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index c457686..de12536 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -201,7 +201,7 @@ armpmu_event_update(struct perf_event *event,
 {
        int shift = 64 - 32;
        s64 prev_raw_count, new_raw_count;
-       s64 delta;
+       u64 delta;
 
 again:
        prev_raw_count = atomic64_read(&hwc->prev_count);


What do you think?

Will






More information about the linux-arm-kernel mailing list