crc32() optimization
Marc Singer
elf at buici.com
Sun Nov 10 23:42:36 EST 2002
On Mon, Nov 11, 2002 at 02:37:33AM +0100, Wolfgang Denk wrote:
> In message <20021111013114.GB27214 at buici.com> you wrote:
> >
> > > > What's "Duff's Device"?
> > >
> > > It's a tricky way to implement general loop unrolling directly in C.
> > > Applied to your problem, code that looks like this (instead of 8 any
> > > other loop count may be used, but you need to adjust the "case"
> > > statements then):
> > >
> > > register int n = (len + (8-1)) / 8;
> > >
> > > switch (len % 8) {
> > > case 0: do { val = crc32_table ... ;
> > > case 7: val = crc32_table ... ;
> > > case 6: val = crc32_table ... ;
> > > case 5: val = crc32_table ... ;
> > > case 4: val = crc32_table ... ;
> > > case 3: val = crc32_table ... ;
> > > case 2: val = crc32_table ... ;
> > > case 1: val = crc32_table ... ;
> > > } while (--n > 0);
> > > }
> >
> > This doesn't look right to me. You are decrementing n but using the
> > modulus of len in the switch. The len modulus is correct when n == 1,
> > but not when n > 1. The idea makes sense, but the implementation
> > appears to be missing a detail.
>
> You don't understand. The switch is only needed for the first,
> partial loop where we want less than N statements; then we're nunning
> the remaining fully unrolled loos in the do{}while loop.
I see. I misread the code. I cannot see why this would not be better
than the original poster's version. I'll test it on my code to see if
there is an improvement.
> > As for performance problems, I believe that the trouble is evident
> > from the assembler output. The reason that the unrolled loop is more
> > efficient than the simple loop is mainly because you don't jump as
> > often. We all know that jumps tend to perturb the instruction fetch
> > queue and cache.
>
> Did you enable optimization?
Indeed. But it doesn't matter since it executes the switch jump only
one time.
More information about the linux-mtd
mailing list