[PATCH 2/2] ARM: lib: use LDRD/STRD for data copy
Boojin Kim
boojin.kim at samsung.com
Tue Mar 27 20:19:49 EDT 2012
Russell King wrote:
> Sent: Tuesday, March 27, 2012 4:41 PM
> To: Boojin Kim
> Cc: linux-arm-kernel at lists.infradead.org; 'Catalin Marinas'; 'Nicolas Pitre';
> kgene.kim at samsung.com
> Subject: Re: [PATCH 2/2] ARM: lib: use LDRD/STRD for data copy
>
> On Tue, Mar 27, 2012 at 09:27:52AM +0900, Boojin Kim wrote:
> > This patch uses LDRD/STRD that loads and stores data as DWORD unit.
> > It brings better performance than LDRM/STRM with cortex-a15.
>
> Why should I bother looking at this rubbish? You've been told before
> that using ldrd and strd unconditionally is not acceptable. Stop
> wasting peoples review time.
This patch brings better memcpy results on Cortex-a15.
Please see following result. I measured it on cortex-a15.
2nd line is default memcpy.
3rd line is memcpy using ldrd/strd with this patch.
4th line is memcpy using ldrd/strd and PLD optimization on my 1st patch.
===================================================================
Memcpy performance (unit: size: Bytes, results: MBps)
===================================================================
size default ldrd/strd ldrd/strd + PLD opti
===================================================================
64 1245.615434 1565.004006 1565.004006
128 1743.861607 2393.535539 2491.230867
256 2199.46509 3212.376645 3487.723214
512 2569.901316 4137.976695 4479.644495
1024 2880.715339 4245.923913 5250.336022
2048 3623.608534 4752.128954 5365.728022
4096 4120.516878 5119.593709 5710.891813
8192 4431.366988 5126.312336 5440.45961
16384 4603.712434 5040.322581 5529.016277
32768 4559.381383 4712.002413 5238.893546
65536 3483.446661 3513.802215 3516.965843
131072 3495.623479 3498.460677 3506.31136
262144 3484.02921 3475.987876 3499.783013
524288 3427.662608 3430.037525 3454.637159
1048576 2263.903195 2225.9222 2458.911587
2097152 1732.182125 1703.940362 1833.96223
4194304 1713.663165 1708.351146 1781.780052
===================================================================
I think it brings meaningful results on cache boundary. So I tried it again.
And, I saw your review. So, I make this patch be effective on cortex-a15 only if machine selects it.
Thanks for your time and review :)
More information about the linux-arm-kernel
mailing list