JFFS3 & performance
jasmine at linuxgrrls.org
jasmine at linuxgrrls.org
Fri Jan 7 09:55:29 EST 2005
> On Fri, 7 Jan 2005 jasmine at linuxgrrls.org wrote:
>
> But the reason why I include that loops is dictated by our discussions
> about backward/forward CRC calculations and CPU cache benefits. CRC's are
> calculated reading bytes. So I don't think we should take into account
> cache line size in our particular case.
You're wrong, because:
i) The instruction cache suffers from this penalty and is, in fact, the
major issue here. Most of the wasted cycles will be waiting for an
instruction to arrive from the i-cache.
ii) All data accesses use the cache, even byte accesses. (Byte accesses
actually reach the processor's data port as word accesses in any
case.) Data is fetched from the interconnect into the cache in
eight-word-long bursts in OMAP1623, regardless of how much the core
has asked for. Critical-word-first means that the core only has
to wait for the first word to arrive, but there is an additional
cycle of penalty as the line is latched. If your data area is
aligned to an eight-word-boundary, and your algorithm works eight
words to a stride, the branch will cover the penalty of traversing
the cache line boundary. This will be slightly faster.
Does this explain?
-J.
More information about the linux-mtd
mailing list