crc32() optimization

Mon Nov 25 10:55:52 EST 2002

Is there not a look-up table based CRC32 elsewhere in the kernel already?

Multiple CRC32 algorithms seem to me to be a terrible waste.
-- 

------------------------------------------------------------------------
Herman Oosthuysen
B.Eng.(E), Member of IEEE
Wireless Networks Inc.
http://www.WirelessNetworksInc.com
E-mail: Herman at WirelessNetworksInc.com
Phone: 1.403.569-5687, Fax: 1.403.235-3965
------------------------------------------------------------------------

Marc Singer wrote:
> On Mon, Nov 11, 2002 at 02:37:33AM +0100, Wolfgang Denk wrote:
> 
>>In message <20021111013114.GB27214 at buici.com> you wrote:
>>
>>>>>What's "Duff's Device"?
>>>>
>>>>It's a tricky way to implement general loop unrolling directly in  C.
>>>>Applied  to your problem, code that looks like this (instead of 8 any
>>>>other loop count may be used, but  you  need  to  adjust  the  "case"
>>>>statements then):
>>>>
>>>>	register int n = (len + (8-1)) / 8;
>>>>
>>>>	switch (len % 8) {
>>>>	case 0: do {	val = crc32_table ... ;
>>>>	case 7:		val = crc32_table ... ;
>>>>	case 6:		val = crc32_table ... ;
>>>>	case 5:		val = crc32_table ... ;
>>>>	case 4:		val = crc32_table ... ;
>>>>	case 3:		val = crc32_table ... ;
>>>>	case 2:		val = crc32_table ... ;
>>>>	case 1:		val = crc32_table ... ;
>>>>		} while (--n > 0);
>>>>	}
>>>
>>>This doesn't look right to me.  You are decrementing n but using the
>>>modulus of len in the switch.  The len modulus is correct when n == 1,
>>>but not when n > 1.  The idea makes sense, but the implementation
>>>appears to be missing a detail.
>>
>>You don't understand. The  switch  is  only  needed  for  the  first,
>>partial loop where we want less than N statements; then we're nunning
>>the remaining fully unrolled loos in the do{}while loop.
> 
> 
> I see.  I misread the code.  I cannot see why this would not be better
> than the original poster's version.  I'll test it on my code to see if
> there is an improvement. 
> 
> 
> 
>>>As for performance problems, I believe that the trouble is evident
>>>from the assembler output.  The reason that the unrolled loop is more
>>>efficient than the simple loop is mainly because you don't jump as
>>>often.  We all know that jumps tend to perturb the instruction fetch
>>>queue and cache.
>>
>>Did you enable optimization?
> 
> 
> Indeed.  But it doesn't matter since it executes the switch jump only
> one time.
> 
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 

-- 

------------------------------------------------------------------------
Herman Oosthuysen
B.Eng.(E), Member of IEEE
Wireless Networks Inc.
http://www.WirelessNetworksInc.com
E-mail: Herman at WirelessNetworksInc.com
Phone: 1.403.569-5687, Fax: 1.403.235-3965
------------------------------------------------------------------------