JFFS3 & performance

Fri Jan 7 09:55:29 EST 2005

> On Fri, 7 Jan 2005 jasmine at linuxgrrls.org wrote:
>
> But the reason why I include that loops is dictated by our discussions
> about backward/forward CRC calculations and CPU cache benefits. CRC's are
> calculated reading bytes. So I don't think we should take into account
> cache line size in our particular case.

You're wrong, because:

i)  The instruction cache suffers from this penalty and is, in fact, the
    major issue here.  Most of the wasted cycles will be waiting for an
    instruction to arrive from the i-cache.

ii) All data accesses use the cache, even byte accesses.  (Byte accesses
    actually reach the processor's data port as word accesses in any
    case.)  Data is fetched from the interconnect into the cache in
    eight-word-long bursts in OMAP1623, regardless of how much the core
    has asked for.  Critical-word-first means that the core only has
    to wait for the first word to arrive, but there is an additional
    cycle of penalty as the line is latched. If your data area is
    aligned to an eight-word-boundary, and your algorithm works eight
    words to a stride, the branch will cover the penalty of traversing
    the cache line boundary.  This will be slightly faster.

Does this explain?

-J.