crc32() optimization
Herman Oosthuysen
Herman at WirelessNetworksInc.com
Mon Nov 25 10:55:52 EST 2002
Is there not a look-up table based CRC32 elsewhere in the kernel already?
Multiple CRC32 algorithms seem to me to be a terrible waste.
--
------------------------------------------------------------------------
Herman Oosthuysen
B.Eng.(E), Member of IEEE
Wireless Networks Inc.
http://www.WirelessNetworksInc.com
E-mail: Herman at WirelessNetworksInc.com
Phone: 1.403.569-5687, Fax: 1.403.235-3965
------------------------------------------------------------------------
Marc Singer wrote:
> On Mon, Nov 11, 2002 at 02:37:33AM +0100, Wolfgang Denk wrote:
>
>>In message <20021111013114.GB27214 at buici.com> you wrote:
>>
>>>>>What's "Duff's Device"?
>>>>
>>>>It's a tricky way to implement general loop unrolling directly in C.
>>>>Applied to your problem, code that looks like this (instead of 8 any
>>>>other loop count may be used, but you need to adjust the "case"
>>>>statements then):
>>>>
>>>> register int n = (len + (8-1)) / 8;
>>>>
>>>> switch (len % 8) {
>>>> case 0: do { val = crc32_table ... ;
>>>> case 7: val = crc32_table ... ;
>>>> case 6: val = crc32_table ... ;
>>>> case 5: val = crc32_table ... ;
>>>> case 4: val = crc32_table ... ;
>>>> case 3: val = crc32_table ... ;
>>>> case 2: val = crc32_table ... ;
>>>> case 1: val = crc32_table ... ;
>>>> } while (--n > 0);
>>>> }
>>>
>>>This doesn't look right to me. You are decrementing n but using the
>>>modulus of len in the switch. The len modulus is correct when n == 1,
>>>but not when n > 1. The idea makes sense, but the implementation
>>>appears to be missing a detail.
>>
>>You don't understand. The switch is only needed for the first,
>>partial loop where we want less than N statements; then we're nunning
>>the remaining fully unrolled loos in the do{}while loop.
>
>
> I see. I misread the code. I cannot see why this would not be better
> than the original poster's version. I'll test it on my code to see if
> there is an improvement.
>
>
>
>>>As for performance problems, I believe that the trouble is evident
>>>from the assembler output. The reason that the unrolled loop is more
>>>efficient than the simple loop is mainly because you don't jump as
>>>often. We all know that jumps tend to perturb the instruction fetch
>>>queue and cache.
>>
>>Did you enable optimization?
>
>
> Indeed. But it doesn't matter since it executes the switch jump only
> one time.
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
--
------------------------------------------------------------------------
Herman Oosthuysen
B.Eng.(E), Member of IEEE
Wireless Networks Inc.
http://www.WirelessNetworksInc.com
E-mail: Herman at WirelessNetworksInc.com
Phone: 1.403.569-5687, Fax: 1.403.235-3965
------------------------------------------------------------------------
More information about the linux-mtd
mailing list