[PATCH v2 00/11] arm64: vdso: getcpu() support
Mark Brown
broonie at kernel.org
Wed Jul 1 16:28:35 EDT 2020
This series is a rebase of the previously posted getcpu() support with
some additional patches 5-10 added which try to do some cleanups and
clarifications of the vDSO code and extend it to multi-page support.
Those patches are currently drafts and haven't been fully tested or
considered, they're posted as there was some discussion of other
applications of the per-CPU data so it seemed useful to share this in
progress work.
Some applications, especially tracing ones, benefit from avoiding the
syscall overhead for getcpu() so it is common for architectures to have
vDSO implementations. Add one for arm64, using TPIDRRO_EL0 to pass a
pointer to per-CPU data rather than just store the immediate value in
order to allow for future extensibility.
It is questionable if something TPIDRRO_EL0 based is worthwhile at all
on current kernels, since v4.18 we have had support for restartable
sequences which can be used to provide a sched_getcpu() implementation
with generally better performance than the vDSO approach on
architectures which have that[1]. Work is ongoing to implement this for
glibc:
https://lore.kernel.org/lkml/20200527185130.5604-3-mathieu.desnoyers@efficio
+s.com/
but is not yet merged and will need similar work for other userspaces.
The main advantages for the vDSO implementation are the node parameter
(though this is a static mapping to CPU number so could be looked up
separately when processing data if it's needed, it shouldn't need to be
in the hot path) and ease of implementation for users.
This is currently not compatible with KPTI due to the use of TPIDRRO_EL0
by the KPTI trampoline, this could be addressed by reinitializing that
system register in the return path but I have found it hard to justify
adding that overhead for all users for something that is essentially a
profiling optimization which is likely to get superceeded by a more
modern implementation - if there are other uses for the per-CPU data
then the balance might change here.
There is some overlap with an in flight patch series from Andrei Vagin
supporting time namespaces in the vDSO, there shouldn't be a fundamental
issue integrating the two serieses.
This builds on work done by Kristina Martsenko some time ago but is a
new implementation.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d7822b1e24f2df5df98c76f0e94a5416349ff759
v2:
- Rebase on v5.8-rc3.
- Add further cleanup patches & a first draft of multi-page support.
Mark Brown (11):
arm64: vdso: Provide a define when building the vDSO
arm64: vdso: Add per-CPU data
arm64: vdso: Initialise the per-CPU vDSO data
arm64: vdso: Add getcpu() implementation
arm64: vdso: Remove union in declaration of the data store
arm64: vdso: Document and verify alignment of vDSO text
arm64: vdso: Rename vdso_pages to vdso_text_pages
arm64: vdso: Simplify pagelist allocation
arm64: vdso: Parameterise vDSO data length assumptions in code
arm64: vdso: Support multiple pages of vDSO data
selftests: vdso: Support arm64 in getcpu() test
arch/arm64/include/asm/processor.h | 12 +-
arch/arm64/include/asm/vdso.h | 11 ++
arch/arm64/include/asm/vdso/datapage.h | 54 +++++++++
arch/arm64/kernel/process.c | 26 +++-
arch/arm64/kernel/vdso.c | 112 ++++++++++++------
arch/arm64/kernel/vdso/Makefile | 4 +-
arch/arm64/kernel/vdso/vdso.lds.S | 3 +-
arch/arm64/kernel/vdso/vgetcpu.c | 48 ++++++++
.../testing/selftests/vDSO/vdso_test_getcpu.c | 10 ++
9 files changed, 229 insertions(+), 51 deletions(-)
create mode 100644 arch/arm64/include/asm/vdso/datapage.h
create mode 100644 arch/arm64/kernel/vdso/vgetcpu.c
base-commit: 9ebcfadb0610322ac537dd7aa5d9cbc2b2894c68
--
2.20.1
More information about the linux-arm-kernel
mailing list