[PATCH v4 2/7] um: enable the use of optimized xor routines in UML
Anton Ivanov
anton.ivanov at cambridgegreys.com
Fri Dec 11 17:40:30 EST 2020
On 11/12/2020 22:00, Johannes Berg wrote:
> On Fri, 2020-12-11 at 21:57 +0000, Anton Ivanov wrote:
>>>> --- /dev/null
>>>> +++ b/arch/um/include/asm/xor-x86.h
>>>> @@ -0,0 +1 @@
>>>> +../../../x86/include/asm/xor.h
>>> Do these really need to be symlinks? Last I looked, it seemed that
>>> arch/x86/include/asm/ is actually in the include path?
>> It is included, but it is included quite far down the list.
> I see. So you're saying basically we'll get asm-generic/xor.h before the
> x86 version, and then we're getting the worst possible implementation,
> right?
A x86 implementation which is at "worst case scenario defaults" and has not undergone an alternative replacement for the actual CPU features can be as bad as the generic. In fact, in some cases generic may even be better :(
>> We pick up a few things out of there, but if we leave them "as is" they
>> all default to their least optimized versions. The results clearly
>> demonstrate that too - 30% difference on 64 bit and > 100% on 32 bit.
> Right.
>
>> This is because we do not perform alternatives substitution. Our
>> "alternatives" processing function in the UML startup is a noop.
> Oh, so we *do* get x86, but compatibility with ancient CPUs?
Not just ancient CPUs. Ancient BUGGY cpus. It is usually the worst case
scenario implementation.
>
>> My idea was to override that to the extent possible and get whatever
>> mileage is possible without that.
> Makes sense.
>
>> I can give it a try to see how it looks if I use the x86 feature table
>> and other bits which are picked up from there, but working with that is
>> like pulling teeth without anaesthetic.
>>
>> On the positive side this means that we can copy the alternatives code
>> on x86.
>>
>> I can give it another go. I tried early on and it was a bit painful.
> Yeah, no, not sure ...
>
> Maybe just doing something like
>
> #include "../../../x86/include/asm/xor.h"
>
> would be acceptable? It seems a bit better to me in the sense of being
> more obvious than the symlinks... but dunno.
That (and everything else) relies on the CPU Features available macros.
I "cheated" on those and created our own using just one ulong - the 5-6
bits which are relevant to us. That is the original idea behind pulling
things in and symlinking - to make sure they pick OUR defs, not the
whole array of features out of the x86 tree.
We also need to noop or redefine a few things like fpu_exit, fpu_enter, etc.
However, based on the discussion we have had so far, I should revisit
this and do it ONLY where it is needed, not in all cases.
I will give it another go on Monday.
Looking at the results, it's definitely worth it. For me it is a
question of how to do it, not "should we do it". 30% difference is in
the realm of "definitely worth it".
>
> johannes
>
>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/
More information about the linux-um
mailing list