[RESEND RFC PATCH v2] arm64: Exposes support for 32-bit syscalls
Steven Price
steven.price at arm.com
Fri Feb 12 06:30:41 EST 2021
On 11/02/2021 20:21, sonicadvance1 at gmail.com wrote:
> From: Ryan Houdek <Sonicadvance1 at gmail.com>
>
> Sorry about the noise. I obviously don't work in this ecosystem.
> Didn't get any comments previously so I'm resending
We're just coming up to a merge window, so I expect people are fairly
busy at the moment. Also from a reviewability perspective I think you
need to split this up into several patches with logical changes, as it
stands the actual code changes are hard to review.
> The problem:
> We need to support 32-bit processes running under a userspace
> compatibility layer. The compatibility layer is a AArch64 process.
> This means exposing the 32bit compatibility syscalls to userspace.
I'm not sure how you come to this conclusion. Running 32-bit processes
under a compatibility layer is a fine goal, but it's not clear why the
entire 32-bit compat syscall layer is needed for this.
As a case in point QEMU's user mode emulation already achieves this in
many cases without any changes to the kernel.
> Why do we need compatibility layers?
> There are ARMv8 CPUs that only support AArch64 but still need to run
> AArch32 applications.
> Cortex-A34/R82 and other cores are prime examples of this.
> Additionally if a user is needing to run legacy 32-bit x86 software, it
> needs the same compatibility layer.
Unless I'm much mistaken QEMU's user mode already does this - admittedly
I don't tend to run "legacy 32-bit x86 software".
> Who does this matter to?
> Any user that has a specific need to run legacy 32-bit software under a
> compatibility layer.
> Not all software is open source or easy to convert to 64bit, it's
> something we need to live with.
> Professional software and the gaming ecosystem is rife with this.
>
> What applications have tried to work around this problem?
> FEX emulator (1) - Userspace x86 to AArch64 compatibility layer
> Tango binary translator (2) - AArch32 to AArch64 compatibility layer
> QEmu (3) - Not really but they do some userspace ioctl emulation
Can you expand on "not really"? Clearly there are limitations, but in
general I can happily "chroot" into a distro filesystem using an
otherwise incompatible architecture using a qemu-xxx-static binary.
> What problems did they hit?
> FEX and Tango hit problems with emulating memory related syscalls.
> - Emulating 32-bit mmap, mremap, shmat in userspace changes behaviour
> All three hit issues with ioctl emulation
> - ioctls are free to do what they want including allocating memory and
> returning opaque structures with pointers.
Now I think we're getting to what the actual problems are:
* mmap and friends have no (easy) way of forcing a mapping into a 32
bit region.
* ioctls are a mess
The first seems like a reasonable goal - I've seen examples of MAP_32BIT
being (ab)used to do this, but it actually restricts to 31 bits and it's
not even available on arm64. Here I think you'd be better off focusing
on coming up with a new (generic) way of restricting the addresses that
the kernel will pick.
ioctls are going to be a problem whatever you do, and I don't think
there is much option other than having a list of known ioctls and
translating them in user space - see below.
> With this patch we will be exposing the compatibility syscall table
> through the regular syscall svc API. There is prior art here where on
> x86-64 they also expose the compatibility tables.
> The justification for this is that we need to maintain support for 32bit
> application compatibility going in to the foreseeable future.
> Userspace does almost all of the heavy lifting here, especially when the
> hardware no longer supports the 32bit use case.
>
> A couple of caveats to this approach.
> Userspace must know that this doesn't solve structure alignment problems
> for the x86->AArch64 (1) case.
> The API for this changes from syscall number in x8 to x7 to match
> AArch32 syscall semantics
This is where the argument of exposing compat falls down - for one of
the main use cases (x86->aarch64) you still need to do a load of fixups
in user space due to the differing alignment/semantics of the
architectures. It's not clear to me why you can't just convert the
arguments to the full 64-bit native ioctls at the same time. You are
already going to have to have an allow-list of ioctls that are handled
because any unknown ioctl is likely to blow up in strange ways due to
the likes of structure alignment differences.
> This is now exposing the compat syscalls to userspace, but for the sake
> of userspace compatibility it is a necessary evil.
You've yet to convince me that it's "necessary" - I agree on the "evil"
part ;)
> Why does the upstream kernel care?
> I believe every user wants to have their software ecosystem continue
> working if they are in a mixed AArch32/AArch64 world even when they are
> running AArch64 only hardware. The kernel should facilitate a good user
> experience.
I fully agree on the goal - just I think you need more justification for
the approach you are taking.
Steve
> External Resources
> (1) https://github.com/FEX-Emu/FEX
> (2) https://www.amanieusystems.com/
> (3) https://www.qemu.org/
>
> Further reading
> - https://github.com/FEX-Emu/FEX/wiki/32Bit-x86-Woes
> - Original patch: https://github.com/Amanieu/linux/commit/b4783002afb0
>
> Changes in v2:
> - Removed a tangential code path to make this more concise
> - Now doesn't cover Tango's full use case
> - This is purely for conciseness sake, easy enough to add back
> - Cleaned up commit message
> Signed-off-by: Ryan Houdek <Sonicadvance1 at gmail.com>
> ---
> arch/arm64/Kconfig | 9 +
> arch/arm64/include/asm/compat.h | 20 +++
> arch/arm64/include/asm/exception.h | 2 +-
> arch/arm64/include/asm/mmu.h | 7 +
> arch/arm64/include/asm/pgtable.h | 10 ++
> arch/arm64/include/asm/processor.h | 6 +-
> arch/arm64/include/asm/thread_info.h | 7 +
> arch/arm64/kernel/asm-offsets.c | 3 +
> arch/arm64/kernel/entry-common.c | 9 +-
> arch/arm64/kernel/fpsimd.c | 2 +-
> arch/arm64/kernel/hw_breakpoint.c | 2 +-
> arch/arm64/kernel/perf_regs.c | 2 +-
> arch/arm64/kernel/process.c | 13 +-
> arch/arm64/kernel/ptrace.c | 6 +-
> arch/arm64/kernel/signal.c | 2 +-
> arch/arm64/kernel/syscall.c | 41 ++++-
> arch/arm64/mm/mmap.c | 249 +++++++++++++++++++++++++++
> 17 files changed, 369 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1515f6f153a0..9832f05daaee 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1147,6 +1147,15 @@ config XEN
> help
> Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64.
>
> +config ARM_COMPAT_DISPATCH
> + bool "32bit syscall dispatch table"
> + depends on COMPAT && ARM64
> + default y
> + help
> + Kernel support for exposing the 32-bit syscall dispatch table to
> + userspace.
> + For dynamically translating 32-bit applications to a 64-bit process.
> +
> config FORCE_MAX_ZONEORDER
> int
> default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h
> index 23a9fb73c04f..d00c6f427999 100644
> --- a/arch/arm64/include/asm/compat.h
> +++ b/arch/arm64/include/asm/compat.h
> @@ -180,10 +180,30 @@ struct compat_shmid64_ds {
>
> static inline int is_compat_task(void)
> {
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> + /* It is compatible if Tango, 32bit compat, or 32bit thread */
> + return current_thread_info()->compat_syscall_flags != 0 || test_thread_flag(TIF_32BIT);
> +#else
> return test_thread_flag(TIF_32BIT);
> +#endif
> }
>
> static inline int is_compat_thread(struct thread_info *thread)
> +{
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> + /* It is compatible if Tango, 32bit compat, or 32bit thread */
> + return thread->compat_syscall_flags != 0 || test_ti_thread_flag(thread, TIF_32BIT);
> +#else
> + return test_ti_thread_flag(thread, TIF_32BIT);
> +#endif
> +}
> +
> +static inline int is_aarch32_compat_task(void)
> +{
> + return test_thread_flag(TIF_32BIT);
> +}
> +
> +static inline int is_aarch32_compat_thread(struct thread_info *thread)
> {
> return test_ti_thread_flag(thread, TIF_32BIT);
> }
> diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h
> index 99b9383cd036..f2c94b44b51c 100644
> --- a/arch/arm64/include/asm/exception.h
> +++ b/arch/arm64/include/asm/exception.h
> @@ -45,7 +45,7 @@ void do_sysinstr(unsigned int esr, struct pt_regs *regs);
> void do_sp_pc_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs);
> void bad_el0_sync(struct pt_regs *regs, int reason, unsigned int esr);
> void do_cp15instr(unsigned int esr, struct pt_regs *regs);
> -void do_el0_svc(struct pt_regs *regs);
> +void do_el0_svc(struct pt_regs *regs, unsigned int iss);
> void do_el0_svc_compat(struct pt_regs *regs);
> void do_ptrauth_fault(struct pt_regs *regs, unsigned int esr);
> #endif /* __ASM_EXCEPTION_H */
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index b2e91c187e2a..0744db65c0a9 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -27,6 +27,9 @@ typedef struct {
> refcount_t pinned;
> void *vdso;
> unsigned long flags;
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> + unsigned long compat_mmap_base;
> +#endif
> } mm_context_t;
>
> /*
> @@ -79,6 +82,10 @@ extern void *fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot);
> extern void mark_linear_text_alias_ro(void);
> extern bool kaslr_requires_kpti(void);
>
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> +extern void process_init_compat_mmap(void);
> +#endif
> +
> #define INIT_MM_CONTEXT(name) \
> .pgd = init_pg_dir,
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 4ff12a7adcfd..5e7662c2675c 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -974,6 +974,16 @@ static inline bool arch_faults_on_old_pte(void)
> }
> #define arch_faults_on_old_pte arch_faults_on_old_pte
>
> +/*
> + * We provide our own arch_get_unmapped_area to handle 32-bit mmap calls from
> + * tango.
> + */
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> +#define HAVE_ARCH_UNMAPPED_AREA
> +#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
> +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
> +#endif
> +
> #endif /* !__ASSEMBLY__ */
>
> #endif /* __ASM_PGTABLE_H */
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index fce8cbecd6bc..03c05cd19f87 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -175,7 +175,7 @@ static inline void arch_thread_struct_whitelist(unsigned long *offset,
> #define task_user_tls(t) \
> ({ \
> unsigned long *__tls; \
> - if (is_compat_thread(task_thread_info(t))) \
> + if (is_aarch32_compat_thread(task_thread_info(t))) \
> __tls = &(t)->thread.uw.tp2_value; \
> else \
> __tls = &(t)->thread.uw.tp_value; \
> @@ -256,8 +256,8 @@ extern struct task_struct *cpu_switch_to(struct task_struct *prev,
> #define task_pt_regs(p) \
> ((struct pt_regs *)(THREAD_SIZE + task_stack_page(p)) - 1)
>
> -#define KSTK_EIP(tsk) ((unsigned long)task_pt_regs(tsk)->pc)
> -#define KSTK_ESP(tsk) user_stack_pointer(task_pt_regs(tsk))
> +#define KSTK_EIP(tsk) ((unsigned long)task_pt_regs(tsk)->pc)
> +#define KSTK_ESP(tsk) user_stack_pointer(task_pt_regs(tsk))
>
> /*
> * Prefetching support
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 1fbab854a51b..cb04c7c4df38 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -41,6 +41,9 @@ struct thread_info {
> #endif
> } preempt;
> };
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> + int compat_syscall_flags; /* 32-bit compat syscall */
> +#endif
> #ifdef CONFIG_SHADOW_CALL_STACK
> void *scs_base;
> void *scs_sp;
> @@ -107,6 +110,10 @@ void arch_release_task_struct(struct task_struct *tsk);
> _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
> _TIF_SYSCALL_EMU)
>
> +#define TIF_COMPAT_32BITSYSCALL 0 /* Trivial 32bit compatible syscall */
> +
> +#define _TIF_COMPAT_32BITSYSCALL (1 << TIF_COMPAT_32BITSYSCALL)
> +
> #ifdef CONFIG_SHADOW_CALL_STACK
> #define INIT_SCS \
> .scs_base = init_shadow_call_stack, \
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 7d32fc959b1a..742203cff128 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -34,6 +34,9 @@ int main(void)
> #ifdef CONFIG_ARM64_SW_TTBR0_PAN
> DEFINE(TSK_TI_TTBR0, offsetof(struct task_struct, thread_info.ttbr0));
> #endif
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> + DEFINE(TI_COMPAT_SYSCALL, offsetof(struct task_struct, thread_info.compat_syscall_flags));
> +#endif
> #ifdef CONFIG_SHADOW_CALL_STACK
> DEFINE(TSK_TI_SCS_BASE, offsetof(struct task_struct, thread_info.scs_base));
> DEFINE(TSK_TI_SCS_SP, offsetof(struct task_struct, thread_info.scs_sp));
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 43d4c329775f..6d98a9c6fafd 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -228,12 +228,12 @@ static void notrace el0_dbg(struct pt_regs *regs, unsigned long esr)
> }
> NOKPROBE_SYMBOL(el0_dbg);
>
> -static void notrace el0_svc(struct pt_regs *regs)
> +static void notrace el0_svc(struct pt_regs *regs, unsigned int iss)
> {
> if (system_uses_irq_prio_masking())
> gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
>
> - do_el0_svc(regs);
> + do_el0_svc(regs, iss);
> }
> NOKPROBE_SYMBOL(el0_svc);
>
> @@ -251,7 +251,10 @@ asmlinkage void notrace el0_sync_handler(struct pt_regs *regs)
>
> switch (ESR_ELx_EC(esr)) {
> case ESR_ELx_EC_SVC64:
> - el0_svc(regs);
> + /* Redundant masking here to show we are getting ISS mask
> + * Then we are pulling the imm16 out of it for SVC64
> + */
> + el0_svc(regs, (esr & ESR_ELx_ISS_MASK) & 0xffff);
> break;
> case ESR_ELx_EC_DABT_LOW:
> el0_da(regs, esr);
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 062b21f30f94..a35ab449a466 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -937,7 +937,7 @@ void fpsimd_release_task(struct task_struct *dead_task)
> void do_sve_acc(unsigned int esr, struct pt_regs *regs)
> {
> /* Even if we chose not to use SVE, the hardware could still trap: */
> - if (unlikely(!system_supports_sve()) || WARN_ON(is_compat_task())) {
> + if (unlikely(!system_supports_sve()) || WARN_ON(is_aarch32_compat_task())) {
> force_signal_inject(SIGILL, ILL_ILLOPC, regs->pc, 0);
> return;
> }
> diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
> index 712e97c03e54..37c9349c4999 100644
> --- a/arch/arm64/kernel/hw_breakpoint.c
> +++ b/arch/arm64/kernel/hw_breakpoint.c
> @@ -168,7 +168,7 @@ static int is_compat_bp(struct perf_event *bp)
> * deprecated behaviour if we use unaligned watchpoints in
> * AArch64 state.
> */
> - return tsk && is_compat_thread(task_thread_info(tsk));
> + return tsk && is_aarch32_compat_thread(task_thread_info(tsk));
> }
>
> /**
> diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
> index f6f58e6265df..c4b061f0d182 100644
> --- a/arch/arm64/kernel/perf_regs.c
> +++ b/arch/arm64/kernel/perf_regs.c
> @@ -66,7 +66,7 @@ int perf_reg_validate(u64 mask)
>
> u64 perf_reg_abi(struct task_struct *task)
> {
> - if (is_compat_thread(task_thread_info(task)))
> + if (is_aarch32_compat_thread(task_thread_info(task)))
> return PERF_SAMPLE_REGS_ABI_32;
> else
> return PERF_SAMPLE_REGS_ABI_64;
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index a47a40ec6ad9..9c0775babbd0 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -314,7 +314,7 @@ static void tls_thread_flush(void)
> {
> write_sysreg(0, tpidr_el0);
>
> - if (is_compat_task()) {
> + if (is_aarch32_compat_task()) {
> current->thread.uw.tp_value = 0;
>
> /*
> @@ -409,7 +409,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
> *task_user_tls(p) = read_sysreg(tpidr_el0);
>
> if (stack_start) {
> - if (is_compat_thread(task_thread_info(p)))
> + if (is_aarch32_compat_thread(task_thread_info(p)))
> childregs->compat_sp = stack_start;
> else
> childregs->sp = stack_start;
> @@ -453,7 +453,7 @@ static void tls_thread_switch(struct task_struct *next)
> {
> tls_preserve_current_state();
>
> - if (is_compat_thread(task_thread_info(next)))
> + if (is_aarch32_compat_thread(task_thread_info(next)))
> write_sysreg(next->thread.uw.tp_value, tpidrro_el0);
> else if (!arm64_kernel_unmapped_at_el0())
> write_sysreg(0, tpidrro_el0);
> @@ -619,7 +619,12 @@ unsigned long arch_align_stack(unsigned long sp)
> */
> void arch_setup_new_exec(void)
> {
> - current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> + process_init_compat_mmap();
> + current_thread_info()->compat_syscall_flags = 0;
> +#endif
> +
> + current->mm->context.flags = is_aarch32_compat_task() ? MMCF_AARCH32 : 0;
>
> ptrauth_thread_init_user(current);
>
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index f49b349e16a3..2e3c242941d1 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -175,7 +175,7 @@ static void ptrace_hbptriggered(struct perf_event *bp,
> const char *desc = "Hardware breakpoint trap (ptrace)";
>
> #ifdef CONFIG_COMPAT
> - if (is_compat_task()) {
> + if (is_aarch32_compat_task()) {
> int si_errno = 0;
> int i;
>
> @@ -1725,7 +1725,7 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
> */
> if (is_compat_task())
> return &user_aarch32_view;
> - else if (is_compat_thread(task_thread_info(task)))
> + else if (is_aarch32_compat_thread(task_thread_info(task)))
> return &user_aarch32_ptrace_view;
> #endif
> return &user_aarch64_view;
> @@ -1906,7 +1906,7 @@ int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task)
> /* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */
> user_regs_reset_single_step(regs, task);
>
> - if (is_compat_thread(task_thread_info(task)))
> + if (is_aarch32_compat_thread(task_thread_info(task)))
> return valid_compat_regs(regs);
> else
> return valid_native_regs(regs);
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index a8184cad8890..e6462b32effa 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -813,7 +813,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
> /*
> * Set up the stack frame
> */
> - if (is_compat_task()) {
> + if (is_aarch32_compat_task()) {
> if (ksig->ka.sa.sa_flags & SA_SIGINFO)
> ret = compat_setup_rt_frame(usig, ksig, oldset, regs);
> else
> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index e4c0dadf0d92..6857dad5df8e 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -21,7 +21,7 @@ static long do_ni_syscall(struct pt_regs *regs, int scno)
> {
> #ifdef CONFIG_COMPAT
> long ret;
> - if (is_compat_task()) {
> + if (is_aarch32_compat_task()) {
> ret = compat_arm_syscall(regs, scno);
> if (ret != -ENOSYS)
> return ret;
> @@ -167,6 +167,9 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
> local_daif_mask();
> flags = current_thread_info()->flags;
> if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP)) {
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> + current_thread_info()->compat_syscall_flags = 0;
> +#endif
> /*
> * We're off to userspace, where interrupts are
> * always enabled after we restore the flags from
> @@ -180,6 +183,9 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
>
> trace_exit:
> syscall_trace_exit(regs);
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> + current_thread_info()->compat_syscall_flags = 0;
> +#endif
> }
>
> static inline void sve_user_discard(void)
> @@ -199,10 +205,39 @@ static inline void sve_user_discard(void)
> sve_user_disable();
> }
>
> -void do_el0_svc(struct pt_regs *regs)
> +void do_el0_svc(struct pt_regs *regs, unsigned int iss)
> {
> sve_user_discard();
> - el0_svc_common(regs, regs->regs[8], __NR_syscalls, sys_call_table);
> + /* XXX: Which style is more ideal to take here? */
> +#if 0
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> + /* Hardcode syscall 0x8000'0000 to be a 32bit support syscall */
> + if (regs->regs[8] == 0x80000000) {
> + current_thread_info()->compat_syscall_flags = _TIF_COMPAT_32BITSYSCALL;
> + el0_svc_common(regs, regs->regs[7], __NR_compat_syscalls,
> + compat_sys_call_table);
> +
> + } else
> +#endif
> + el0_svc_common(regs, regs->regs[8], __NR_syscalls, sys_call_table);
> +#else
> + switch (iss) {
> + /* SVC #1 is now a 32bit support syscall
> + * Any other SVC ISS falls down the regular syscall code path
> + */
> + case 1:
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> + current_thread_info()->compat_syscall_flags = _TIF_COMPAT_32BITSYSCALL;
> + el0_svc_common(regs, regs->regs[7], __NR_compat_syscalls,
> + compat_sys_call_table);
> +#else
> + return -ENOSYS;
> +#endif
> + break;
> + default:
> + el0_svc_common(regs, regs->regs[8], __NR_syscalls, sys_call_table);
> + }
> +#endif
> }
>
> #ifdef CONFIG_COMPAT
> diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
> index 3028bacbc4e9..857aa03a3ac2 100644
> --- a/arch/arm64/mm/mmap.c
> +++ b/arch/arm64/mm/mmap.c
> @@ -17,6 +17,8 @@
> #include <linux/io.h>
> #include <linux/personality.h>
> #include <linux/random.h>
> +#include <linux/security.h>
> +#include <linux/hugetlb.h>
>
> #include <asm/cputype.h>
>
> @@ -68,3 +70,250 @@ int devmem_is_allowed(unsigned long pfn)
> }
>
> #endif
> +
> +#ifdef CONFIG_ARM_COMPAT_DISPATCH
> +
> +/* Definitions for compat syscall guest mmap area */
> +#define COMPAT_MIN_GAP (SZ_128M)
> +#define COMPAT_STACK_TOP 0xffff0000
> +#define COMPAT_MAX_GAP (COMPAT_STACK_TOP/6*5)
> +#define COMPAT_TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE_32 / 4)
> +#define COMPAT_STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))
> +
> +#ifndef arch_get_mmap_end
> +#define arch_get_mmap_end(addr) (TASK_SIZE)
> +#endif
> +
> +#ifndef arch_get_mmap_base
> +#define arch_get_mmap_base(addr, base) (base)
> +#endif
> +
> +static int mmap_is_legacy(unsigned long rlim_stack)
> +{
> + if (current->personality & ADDR_COMPAT_LAYOUT)
> + return 1;
> +
> + if (rlim_stack == RLIM_INFINITY)
> + return 1;
> +
> + return sysctl_legacy_va_layout;
> +}
> +
> +static unsigned long compat_mmap_base(unsigned long rnd, unsigned long gap)
> +{
> + unsigned long pad = stack_guard_gap;
> +
> + /* Account for stack randomization if necessary */
> + if (current->flags & PF_RANDOMIZE)
> + pad += (COMPAT_STACK_RND_MASK << PAGE_SHIFT);
> +
> + /* Values close to RLIM_INFINITY can overflow. */
> + if (gap + pad > gap)
> + gap += pad;
> +
> + if (gap < COMPAT_MIN_GAP)
> + gap = COMPAT_MIN_GAP;
> + else if (gap > COMPAT_MAX_GAP)
> + gap = COMPAT_MAX_GAP;
> +
> + return PAGE_ALIGN(COMPAT_STACK_TOP - gap - rnd);
> +}
> +
> +void process_init_compat_mmap(void)
> +{
> + unsigned long random_factor = 0UL;
> + unsigned long rlim_stack = rlimit(RLIMIT_STACK);
> +
> + if (current->flags & PF_RANDOMIZE) {
> + random_factor = (get_random_long() &
> + ((1UL << mmap_rnd_compat_bits) - 1)) << PAGE_SHIFT;
> + }
> +
> + if (mmap_is_legacy(rlim_stack)) {
> + current->mm->context.compat_mmap_base =
> + COMPAT_TASK_UNMAPPED_BASE + random_factor;
> + } else {
> + current->mm->context.compat_mmap_base =
> + compat_mmap_base(random_factor, rlim_stack);
> + }
> +}
> +
> +/* Get an address range which is currently unmapped.
> + * For shmat() with addr=0.
> + *
> + * Ugly calling convention alert:
> + * Return value with the low bits set means error value,
> + * ie
> + * if (ret & ~PAGE_MASK)
> + * error = ret;
> + *
> + * This function "knows" that -ENOMEM has the bits set.
> + */
> +unsigned long
> +arch_get_unmapped_area(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> + struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma, *prev;
> + struct vm_unmapped_area_info info;
> + const unsigned long mmap_end = arch_get_mmap_end(addr);
> + bool bad_addr = false;
> +
> + if (len > mmap_end - mmap_min_addr)
> + return -ENOMEM;
> +
> + /*
> + * Ensure that translated processes do not allocate the last
> + * page of the 32-bit address space, or anything above it.
> + */
> + if (is_compat_task())
> + bad_addr = addr + len > TASK_SIZE_32;
> +
> + if (flags & MAP_FIXED)
> + return bad_addr ? -ENOMEM : addr;
> +
> + if (addr && !bad_addr) {
> + addr = PAGE_ALIGN(addr);
> + vma = find_vma_prev(mm, addr, &prev);
> + if (mmap_end - len >= addr && addr >= mmap_min_addr &&
> + (!vma || addr + len <= vm_start_gap(vma)) &&
> + (!prev || addr >= vm_end_gap(prev)))
> + return addr;
> + }
> +
> + info.flags = 0;
> + info.length = len;
> + if (is_compat_task()) {
> + info.low_limit = mm->context.compat_mmap_base;
> + info.high_limit = TASK_SIZE_32;
> + } else {
> + info.low_limit = mm->mmap_base;
> + info.high_limit = mmap_end;
> + }
> + info.align_mask = 0;
> + return vm_unmapped_area(&info);
> +}
> +
> +/*
> + * This mmap-allocator allocates new areas top-down from below the
> + * stack's low limit (the base):
> + */
> +unsigned long
> +arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff,
> + unsigned long flags)
> +{
> +
> + struct vm_area_struct *vma, *prev;
> + struct mm_struct *mm = current->mm;
> + struct vm_unmapped_area_info info;
> + const unsigned long mmap_end = arch_get_mmap_end(addr);
> + bool bad_addr = false;
> +
> + /* requested length too big for entire address space */
> + if (len > mmap_end - mmap_min_addr)
> + return -ENOMEM;
> +
> + /*
> + * Ensure that translated processes do not allocate the last
> + * page of the 32-bit address space, or anything above it.
> + */
> + if (is_compat_task())
> + bad_addr = addr + len > TASK_SIZE_32;
> +
> + if (flags & MAP_FIXED)
> + return bad_addr ? -ENOMEM : addr;
> +
> + /* requesting a specific address */
> + if (addr && !bad_addr) {
> + addr = PAGE_ALIGN(addr);
> + vma = find_vma_prev(mm, addr, &prev);
> + if (mmap_end - len >= addr && addr >= mmap_min_addr &&
> + (!vma || addr + len <= vm_start_gap(vma)) &&
> + (!prev || addr >= vm_end_gap(prev)))
> + return addr;
> + }
> +
> + info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> + info.length = len;
> + info.low_limit = max(PAGE_SIZE, mmap_min_addr);
> + if (is_compat_task())
> + info.high_limit = mm->context.compat_mmap_base;
> + else
> + info.high_limit = arch_get_mmap_base(addr, mm->mmap_base);
> + info.align_mask = 0;
> + addr = vm_unmapped_area(&info);
> +
> + /*
> + * A failed mmap() very likely causes application failure,
> + * so fall back to the bottom-up function here. This scenario
> + * can happen with large stack limits and large mmap()
> + * allocations.
> + */
> + if (offset_in_page(addr)) {
> + VM_BUG_ON(addr != -ENOMEM);
> + info.flags = 0;
> + if (is_compat_task()) {
> + info.low_limit = COMPAT_TASK_UNMAPPED_BASE;
> + info.high_limit = TASK_SIZE_32;
> + } else {
> + info.low_limit = TASK_UNMAPPED_BASE;
> + info.high_limit = mmap_end;
> + }
> + addr = vm_unmapped_area(&info);
> + }
> +
> + return addr;
> +}
> +
> +unsigned long
> +hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> + unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> + struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> + struct hstate *h = hstate_file(file);
> + struct vm_unmapped_area_info info;
> + bool bad_addr = false;
> +
> + if (len & ~huge_page_mask(h))
> + return -EINVAL;
> + if (len > TASK_SIZE)
> + return -ENOMEM;
> +
> + /*
> + * Ensure that translated processes do not allocate the last
> + * page of the 32-bit address space, or anything above it.
> + */
> + if (is_compat_task())
> + bad_addr = addr + len > TASK_SIZE_32;
> +
> + if (flags & MAP_FIXED) {
> + if (prepare_hugepage_range(file, addr, len))
> + return -EINVAL;
> + return bad_addr ? -ENOMEM : addr;
> + }
> +
> + if (addr && !bad_addr) {
> + addr = ALIGN(addr, huge_page_size(h));
> + vma = find_vma(mm, addr);
> + if (TASK_SIZE - len >= addr &&
> + (!vma || addr + len <= vm_start_gap(vma)))
> + return addr;
> + }
> +
> + info.flags = 0;
> + info.length = len;
> + if (is_compat_task()) {
> + info.low_limit = COMPAT_TASK_UNMAPPED_BASE;
> + info.high_limit = TASK_SIZE_32;
> + } else {
> + info.low_limit = TASK_UNMAPPED_BASE;
> + info.high_limit = TASK_SIZE;
> + }
> + info.align_mask = PAGE_MASK & ~huge_page_mask(h);
> + info.align_offset = 0;
> + return vm_unmapped_area(&info);
> +}
> +
> +#endif
>
More information about the linux-arm-kernel
mailing list