[RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation.
christophe.leroy at c-s.fr
Mon Jan 20 09:08:23 PST 2020
Le 20/01/2020 à 16:19, Segher Boessenkool a écrit :
> On Mon, Jan 20, 2020 at 02:56:00PM +0000, Christophe Leroy wrote:
>>> Nice! Much better.
>>> It should be tested on more representative hardware, too, but this looks
>>> promising alright :-)
>> mpc832x (e300c2 core) at 333 MHz:
>> gettimeofday: vdso: 235 nsec/call
>> clock-gettime-realtime: vdso: 244 nsec/call
>> With the series:
>> gettimeofday: vdso: 271 nsec/call
>> clock-gettime-realtime: vdso: 281 nsec/call
> Those are important, and degrade ~15%. That is acceptable IMO, but do
> you see a way to optimise this (later)?
Not easy I think.
First we have the unavoidable ASM entry function that can't be dropped
because of the CR[SO] bit the set on error or clear on no error and that
can't be done in C.
In our ASM VDSO, fixed shifts are used, while in generic C VDSO, shifts
are generic and read from the VDSO data.
And there is still some funny code generated by GCC (8.1), like:
620: 7d 29 3c 30 srw r9,r9,r7
624: 21 87 00 20 subfic r12,r7,32
628: 7d 07 3c 31 srw. r7,r8,r7
62c: 7d 08 60 30 slw r8,r8,r12
630: 7d 0b 4b 78 or r11,r8,r9
634: 39 40 00 00 li r10,0
638: 40 82 00 84 bne 6bc <__c_kernel_clock_gettime+0x114>
63c: 81 23 00 24 lwz r9,36(r3)
640: 81 05 00 00 lwz r8,0(r5)
6bc: 7d 69 5b 78 mr r9,r11
6c0: 7c ea 3b 78 mr r10,r7
6c4: 7d 2b 4b 78 mr r11,r9
6c8: 4b ff ff 74 b 63c <__c_kernel_clock_gettime+0x94>
This branch to 6bc is totally useless:
- copying r11 into r9 is pointless as r9 is overwritten in 63c
- copying back r9 into r11 is pointless as r11 has not been modified
- loading r10 with 0 then overwritting r10 with r7 when r7 is not 0 is
pointless as well, could have directly put the result of srw. in r10.
More information about the linux-arm-kernel