[RFC] change non-atomic bitops method
Kirill A. Shutemov
kirill at shutemov.name
Tue Feb 3 02:39:32 PST 2015
On Tue, Feb 03, 2015 at 03:17:30AM +0200, Kirill A. Shutemov wrote:
> Results for 10 runs on my laptop -- i5-3427U (IvyBridge 1.8 Ghz, 2.8Ghz Turbo
> with 3MB LLC):
I've screwed up the inner loop condition and step. As result the benchmark
touches the same cache line 8 times and scan SIZE/8 of memory. Fixed test
is in attach.
Avg Stddev
baseline 14.0663 0.0182
-DCHECK_BEFORE_SET 13.8594 0.0458
-DCACHE_HOT 12.3896 0.0867
-DCACHE_HOT -DCHECK_BEFORE_SET 11.7480 0.2497
And now it's faster *with* the check. Sometimes CPU is just too clever. ;)
--
Kirill A. Shutemov
-------------- next part --------------
#include <stdio.h>
#include <time.h>
#include <sys/mman.h>
#ifdef CACHE_HOT
#define SIZE (2UL << 20)
#define TIMES 100000
#else
#define SIZE (1UL << 30)
#define TIMES 100
#endif
#define CACHE_LINE 64
int main(int argc, char **argv)
{
struct timespec a, b, diff;
unsigned long i, *p, times = TIMES;
p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
clock_gettime(CLOCK_MONOTONIC, &a);
while (times--) {
for (i = 0; i < SIZE / sizeof(*p);
i += CACHE_LINE / sizeof(*p)) {
#ifdef CHECK_BEFORE_SET
if (p[i] != times)
#endif
p[i] = times;
}
}
clock_gettime(CLOCK_MONOTONIC, &b);
diff.tv_sec = b.tv_sec - a.tv_sec;
if (a.tv_nsec > b.tv_nsec) {
diff.tv_sec--;
diff.tv_nsec = 1000000000 + b.tv_nsec - a.tv_nsec;
} else
diff.tv_nsec = b.tv_nsec - a.tv_nsec;
printf("%lu.%09lu\n", diff.tv_sec, diff.tv_nsec);
return 0;
}
More information about the linux-arm-kernel
mailing list