[PATCH] arm64: Add support for Half precision floating point

Tue Feb 2 09:31:22 PST 2016

On 28/01/16 16:51, Adhemerval Zanella wrote:
> On 28-01-2016 14:07, Will Deacon wrote:
>> On Tue, Jan 26, 2016 at 10:25:38PM +0530, Siddhesh Poyarekar wrote:
>>> Adding Adhemerval to cc since he had volunteered to follow up on this,
>>> mainly because he had a couple of additional ideas on the kernel
>>> front.
>>>
>>> On Tue, Jan 26, 2016 at 04:21:43PM +0000, Suzuki K. Poulose wrote:
>>>> On 26/01/16 16:02, Will Deacon wrote:
>>>>> Hi Suzuki,
>>>>>
>>>>> On Tue, Jan 26, 2016 at 03:52:46PM +0000, Suzuki K Poulose wrote:
>>>>>> ARMv8.2 extensions [1] include an optional feature, which supports
>>>>>> half precision(16bit) floating point/asimd data processing
>>>>>> instructions. This patch adds support for detecting and exposing
>>>>>> the same to the userspace via HWCAPs
>>>>
>>>>
>>>>>> +#define HWCAP_FPHP		(1 << 9)
>>>>>> +#define HWCAP_ASIMDHP		(1 << 10)
>>>>>
>>>>> Where did we get to with the mrs trapping you proposed here?
>>>>>
>>>>>   http://lists.infradead.org/pipermail/linux-arm-kernel/2015-October/374609.html
>>>>
>>>> We are yet to get some feedback from glibc/gcc folks. Siddhesh was looking
>>>> to make use of it [2]. But haven't heard anything back. Ramana mentioned
>>>> (in private) that they had some plans to take a look at it.
>>>
>>> I believe one of Adhemerval's ideas was similar to what I had
>>> mentioned back then, which was to provide all of the CPU information
>>> in a single file instead of having to traverse a directory structure.
>>
>> My understanding was that libc needed this information extremely early
>> on (i.e. before it could even issue system calls), and therefore such
>> an approach would be in addition to the proposal here. Am I mistaken?
> 
> If the idea is to use these instruction for function implementation selection
> (iFUNC) the idea is to have on PLT resolution either by accessing it directly
> or using a caching mechanism. x86_64 does something similar with cacheline
> information: it issues a single cpuid and create processor information table
> based on its information (it is also what the __builtin_supports() also
> does).
> 

__builtin_supports is not a single cpuid on x86, it is
a cpuid per dso with one cache per dso.

(gcc-5 used a single cache in libgcc_s.so.1 and that
turned out to be broken because ifunc in other dsos
could not reliably access it.)

>>> The other idea was to add a vDSO function that returns this data so as
>>> to avoid (or at least reduce) the context switch latency.
>>
>> I'm not at all keen on adding a data ABI to the vDSO. I think people tried
>> similar things in the past (something on PPC?) and have horror stories
>> from that.
> 
> In fact ppc still exports it in vDSO (include/asm/vdso_datapage.h), with
> information like the LPAR cfg, platform, processor, {d,i}cache, etc.
> I recall that I have see some code back at IBM that tried to use these
> fields directly, but indeed it is not recommended.
> 
> What I have in mind is something what ppc does with __kernel_get_syscall_map.
> It is vDSO function that returns a vDSO internal data related to which
> syscalls are implemented in the running kernel (through a bitmap field).
> 

fs access or vdso does not work for ifunc based dispatch
(assuming the current ifunc implementation in glibc).

(for vdso you need the AT_SYSINFO_EHDR auxval somehow and
then implement elf symbol lookup in the ifunc resolver
without calling any libc function. passing auxvals to the
ifunc resolver can be done by changing the ifunc abi, but
doing symbol lookups there is unrealistic.)

in the libc (e.g. for memcpy) ifunc is a bit easier to use,
but in user code (function-multi-versioning) ifunc is very
limited.

i wrote about the ifunc limitations here:
https://sourceware.org/ml/libc-alpha/2015-11/msg00108.html
see point (4) and (5).