crc32() optimization

Joakim Tjernlund joakim.tjernlund at lumentis.se
Mon Nov 25 11:12:00 EST 2002


Yes, there are. In 2.5 these has been replace with a common lib(see lib/crc32.c).
I have a back port to 2.4, but first we try get the optimized version into 2.5 and
then do a new back port to 2.4 and then try to get that into 2.4.

The final 2.5 patch should appear sometime today on lkm.

  Jocke

> 
> Is there not a look-up table based CRC32 elsewhere in the kernel already?
> 
> Multiple CRC32 algorithms seem to me to be a terrible waste.
> -- 
> 
> ------------------------------------------------------------------------
> Herman Oosthuysen
> B.Eng.(E), Member of IEEE
> Wireless Networks Inc.
> http://www.WirelessNetworksInc.com
> E-mail: Herman at WirelessNetworksInc.com
> Phone: 1.403.569-5687, Fax: 1.403.235-3965
> ------------------------------------------------------------------------
> 
> 
> Marc Singer wrote:
> > On Mon, Nov 11, 2002 at 02:37:33AM +0100, Wolfgang Denk wrote:
> > 
> >>In message <20021111013114.GB27214 at buici.com> you wrote:
> >>
> >>>>>What's "Duff's Device"?
> >>>>
> >>>>It's a tricky way to implement general loop unrolling directly in  C.
> >>>>Applied  to your problem, code that looks like this (instead of 8 any
> >>>>other loop count may be used, but  you  need  to  adjust  the  "case"
> >>>>statements then):
> >>>>
> >>>>	register int n = (len + (8-1)) / 8;
> >>>>
> >>>>	switch (len % 8) {
> >>>>	case 0: do {	val = crc32_table ... ;
> >>>>	case 7:		val = crc32_table ... ;
> >>>>	case 6:		val = crc32_table ... ;
> >>>>	case 5:		val = crc32_table ... ;
> >>>>	case 4:		val = crc32_table ... ;
> >>>>	case 3:		val = crc32_table ... ;
> >>>>	case 2:		val = crc32_table ... ;
> >>>>	case 1:		val = crc32_table ... ;
> >>>>		} while (--n > 0);
> >>>>	}
> >>>
> >>>This doesn't look right to me.  You are decrementing n but using the
> >>>modulus of len in the switch.  The len modulus is correct when n == 1,
> >>>but not when n > 1.  The idea makes sense, but the implementation
> >>>appears to be missing a detail.
> >>
> >>You don't understand. The  switch  is  only  needed  for  the  first,
> >>partial loop where we want less than N statements; then we're nunning
> >>the remaining fully unrolled loos in the do{}while loop.
> > 
> > 
> > I see.  I misread the code.  I cannot see why this would not be better
> > than the original poster's version.  I'll test it on my code to see if
> > there is an improvement. 
> > 
> > 
> > 
> >>>As for performance problems, I believe that the trouble is evident
> >>>from the assembler output.  The reason that the unrolled loop is more
> >>>efficient than the simple loop is mainly because you don't jump as
> >>>often.  We all know that jumps tend to perturb the instruction fetch
> >>>queue and cache.
> >>
> >>Did you enable optimization?
> > 
> > 
> > Indeed.  But it doesn't matter since it executes the switch jump only
> > one time.
> > 
> > 
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> > 
> 
> -- 
> 
> ------------------------------------------------------------------------
> Herman Oosthuysen
> B.Eng.(E), Member of IEEE
> Wireless Networks Inc.
> http://www.WirelessNetworksInc.com
> E-mail: Herman at WirelessNetworksInc.com
> Phone: 1.403.569-5687, Fax: 1.403.235-3965
> ------------------------------------------------------------------------
> 
> 
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 





More information about the linux-mtd mailing list