[PATCHv2] ARM64: Add AT_ARM64_MIDR to the aux vector

Tue Sep 1 17:28:28 PDT 2015

> On Sep 2, 2015, at 3:13 AM, Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:
> 
> On Wed, 2 Sep 2015 01:58:56 +0800
> pinskia at gmail.com wrote:
> 
>>> On Sep 2, 2015, at 1:30 AM, Mark Rutland <mark.rutland at arm.com> wrote:
>>> 
>>> [...]
>>> 
>>>>>>> On Sat, Aug 29, 2015 at 07:46:22PM +0100, Andrew Pinski wrote:
>>>>>>> It is useful to pass down MIDR register down to userland if all of
>>>>>>> the online cores are all the same type.  This adds AT_ARM64_MIDR
>>>>>>> aux vector type and passes down the midr system register.
>>>>>>> 
>>>>>>> This is alternative to MIDR_EL1 part of
>>>>>>> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358995.html.
>>>>>>> It allows for faster access to midr_el1 than going through a trap and
>>>>>>> does not exist if the set of cores are not the same.
>>>>>> 
>>>>>> I'm not sure I follow the rationale. If speed is important the
>>>>>> application can cache the value the first time it reads it with a trap.
>>>>> 
>>>>> It is also about compatibility also. Exposing the register is not backwards compatible but using the aux vector is.
>>>> 
>>>> That would also break big.little too. So either break it with hot plug or break it in userland, your choice.
>>> 
>>> The value wouldn't be representative of the system as a whole; that is
>>> true. However, we never guaranteed that it was, while the aux vector
>>> code implied that we did.
>> 
>> Yes but I guess you talk about caching the value in userspace but doing
>> it via the aux vector is the same as your suggestion. Just one
>> difference is you don't get the aux vector entry if there is a CPU
>> that is online which is different. No difference from your suggestion
>> of caching it. Without considering hot pug for a second (that is a
>> huge different issue all together), if userland wants to know if all
>> up CPUs have the same midr, they would either read /sys entries (lots
>> of syscalls) or bind to each CPU and do the trap. That means at least
>> three or two syscalls/traps for each CPU. My way is none and gets a
>> value of midr if they are all the Same for free. 
> 
> Andrew, how do you propose to get the value of MIDR? Open the
> "/proc/self/auxv", read it, do a linear search in the buffer to find
> the required entry and then read the value? Or use the glibc specific
> getauxval() function (https://lwn.net/Articles/519085) ?

This is inside glibc I am talking about so getauxval. 

> 
> Regarding the caching implementation, one can open and parse the
> "/proc/cpuinfo" file (with older kernels) or read the new sysfs
> entries to get the MIDR value for each core. Then create a lookup
> table. As an additional bonus, this lookup table can contain not
> just the MIDR values, but any arbitrary data in any format (for
> example, a function pointer to the memcpy function or anything else).

You don't want to do that early on in ld.so each time a program starts up. Too much overhead. 

> 
> After the lookup table is available, one can use the getcpu() syscall
> for getting the CPU number and do the table lookup. And for getting
> reasonable performance, implement the vdso variant of the getcpu()
> syscall.
> 
> All of this internal ugliness would be best abstracted inside
> of the GCC __builtin_cpu_init(), __builtin_cpu_is() and
> __builtin_cpu_supports() builtins:
>    http://gcc.gnu.org/gcc-4.8/changes.html

Yes but this is about glibc support and not other userland support. Having glibc depend on that is even worse. 

Thanks,
Andrew

> 
> One big.LITTLE systems, the __builtin_cpu_is() could be implemented
> via a single getcpu() syscall and the table lookup, like explained
> above. The __builtin_cpu_init() could prepare the lookup table.
> And on normal systems with identical cores, the use of the syscall
> is not required.
> 
> It might be interesting to also optionally allow something like this:
>    __builtin_cpu_is("cortex-a7 || cortex-a15")
> Which would mean that we are interested in checking for the
> Cortex-A7+Cortex-A15 pair in a big.LITTLE system, but are not
> interested in knowing whether we are running on A7 or A15 in this
> particular moment (and avoid the syscall overhead).
> 
> We had an old discussion on a similar CPU type identification topic
> in the past:
>    http://lists.infradead.org/pipermail/linux-arm-kernel/2013-December/220542.html
> I have been told that it had been forwarded to the Linaro toolchain
> people, but did not track if this resulted in anything useful or not.
> 
> I think that it would be best to prefer something that is easily usable
> for all applications and libraries, and not just something for a private
> use by glibc. To sum everything up:
> 
> One the kernel side it means:
>  1. Maybe implement vdso for getcpu(), this will make things faster
>     on big.LITTLE systems.
>  2. Maybe implement sysfs entries for per-core MIDR values from
>       http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/359127.html
>     This will make things faster and allow to avoid potentially
>     messy and cumbersome /proc/cpuinfo text parsing.
> 
> On the GCC side it means:
>  1. Implement __builtin_cpu_init(), __builtin_cpu_is() and
>     __builtin_cpu_supports() builtins, which rely on reading sysfs
>     entries (with a fallback to /proc/cpuinfo parsing on old kernels)
>     and the getcpu() syscall for the reasonably accurate core type
>     runtime identification on big.LITTLE systems.
> 
> On the applications/libraries side (including, but not limited to glibc)
> it means:
>  1. Rely on the GCC __builtin_cpu_init(), __builtin_cpu_is() and
>     __builtin_cpu_supports() builtins.
>  2. Maybe implement the replacement of these builtins to get all the
>     same functionality even with the old versions of GCC.
> 
> -- 
> Best regards,
> Siarhei Siamashka