makedumpfile: optimize is_zero_page
Atsushi Kumagai
kumagai-atsushi at mxc.nes.nec.co.jp
Tue Mar 4 23:55:00 EST 2014
Hello Marc,
>There are local complaints that filtering out only zero pages is slow. I
>found that is_zero_page was inefficient. It checks if the page contains any
>non-zero bytes - one byte at a time.
>
>Improve performance by checking for non-zero data 64 bits at a time. Also,
>unroll the loop for additional performance.
>
>Did testing in x86_64 mode on an Intel Xeon x5560 system with 18GB RAM.
>Executed:
> time makedumpfile -d 1 /proc/vmcore <destination>
>
>The amount of time taken in User space was reduced by 75%. The total time to
>dump memory was reduced by 28%.
Thanks for your good work, but...
>is_zero_page
>Signed-off-by: Marc Milgram <mmilgram at redhat.com>
>---
>diff --git a/makedumpfile.h b/makedumpfile.h
>index 3d270c6..0f211c4 100644
>--- a/makedumpfile.h
>+++ b/makedumpfile.h
>@@ -1634,10 +1634,27 @@ static inline int
> is_zero_page(unsigned char *buf, long page_size)
> {
> size_t i;
>+ unsigned long long *vect = (unsigned long long *) buf;
>+ long page_len = page_size / (sizeof(unsigned long long));
>
>- for (i = 0; i < page_size; i++)
>- if (buf[i])
>+ for (i = 0; i < page_len; i+=8) {
>+ if (vect[i])
> return FALSE;
>+ if (vect[i+1])
>+ return FALSE;
>+ if (vect[i+2])
>+ return FALSE;
>+ if (vect[i+3])
>+ return FALSE;
>+ if (vect[i+4])
>+ return FALSE;
>+ if (vect[i+5])
>+ return FALSE;
>+ if (vect[i+6])
>+ return FALSE;
>+ if (vect[i+7])
>+ return FALSE;
>+ }
> return TRUE;
> }
It looks messy, I don't like such a manual loop unrolling and it
seems to affect performance a little according to my small test:
(test results for 4k page)
8 bits x 1 line x 4096 loops (current code): user 0m7.045s
64 bits x 1 line x 512 loops : user 0m1.847s
64 bits x 8 lines x 64 loops (your patch) : user 0m1.546s
64 bits x 512 lines x 1 loop : user 0m1.642s
(dump information)
Original pages : 0x000000000013073b
Excluded pages : 0x0000000000108b4e
Pages filled with zero : 0x0000000000108b4e
Cache pages : 0x0000000000000000
Cache pages + private : 0x0000000000000000
User process data pages : 0x0000000000000000
Free pages : 0x0000000000000000
Hwpoison pages : 0x0000000000000000
Remaining pages : 0x0000000000027bed
(The number of pages is reduced to 13%.)
Memory Hole : 0x000000000003f8c5
--------------------------------------------------
Total pages : 0x0000000000170000
So I think the loop unrolling is unnecessary.
Thanks
Atsushi Kumagai
>_______________________________________________
>kexec mailing list
>kexec at lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec
More information about the kexec
mailing list