JFFS3 & performance

Artem B. Bityuckiy dedekind at infradead.org
Fri Jan 7 10:20:11 EST 2005


On Fri, 7 Jan 2005 jasmine at linuxgrrls.org wrote:

> > On Fri, 7 Jan 2005 jasmine at linuxgrrls.org wrote:
> >
> > But the reason why I include that loops is dictated by our discussions
> > about backward/forward CRC calculations and CPU cache benefits. CRC's are
> > calculated reading bytes. So I don't think we should take into account
> > cache line size in our particular case.
> 
> You're wrong, because:
> 
> i)  The instruction cache suffers from this penalty and is, in fact, the
>     major issue here.  Most of the wasted cycles will be waiting for an
>     instruction to arrive from the i-cache.
> 
> ii) All data accesses use the cache, even byte accesses.  (Byte accesses
>     actually reach the processor's data port as word accesses in any
>     case.)  Data is fetched from the interconnect into the cache in
>     eight-word-long bursts in OMAP1623, regardless of how much the core
>     has asked for.  Critical-word-first means that the core only has
>     to wait for the first word to arrive, but there is an additional
>     cycle of penalty as the line is latched. If your data area is
>     aligned to an eight-word-boundary, and your algorithm works eight
>     words to a stride, the branch will cover the penalty of traversing
>     the cache line boundary.  This will be slightly faster.
> 
> Does this explain?

I'm sorry, not exactly. Ok, could you please write the code you think is 
better? Currentlly it is:

/* Trash the CPU data chache */
trash_cache();
ts1 = TIMESTAMP();
for (j = 0; j < memsizes[i]; j++)
	mem[i][j] = mem[i][j] + 1;
for (j = 0; j < memsizes[i]; j++)
        mem[i][j] = mem[i][j] + 1;
ts2 = TIMESTAMP();

Where memsizes array is 32-byte aligned (kmalloc does this - see 
mm/slab.c, ARCH_KMALLOC_FLAGS definition).

So, I suppose you suggest to write smth like:

/* Trash the CPU data chache */
trash_cache();
ts1 = TIMESTAMP();
for (j = 0; j < memsizes[i]; j += L1_CACHE_BYTES)
	mem[i][j] = mem[i][j] + 1;
for (j = 0; j < memsizes[i]; j += L1_CACHE_BYTES)
        mem[i][j] = mem[i][j] + 1;
ts2 = TIMESTAMP();

? 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.




More information about the linux-mtd mailing list