[RFC] syscalls: Restore address limit after a syscall

Fri Feb 10 12:49:34 PST 2017

On Fri, Feb 10, 2017 at 11:22 AM, Russell King - ARM Linux
<linux at armlinux.org.uk> wrote:
> On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
>> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie at google.com> wrote:
>> > So by default it is in the wrapper. If selected, an architecture can
>> > disable the wrapper put it in the best places. Understood correctly?
>>
>> Sounds good to me.
>>
>> Presumably the result should go through -mm.  Want to cc: akpm and
>> linux-arch@ on the next version?
>>
>> I've also cc'd arm and s390 folks -- those are the other arches that
>> try to be on top of hardening.
>
> The best place for this on ARM is in the assembly code, rather than in
> the hundreds of system calls - having it in one place is surely better
> for reducing the cache impact.
>
> This (untested) patch should be sufficient for ARM - there's two choices
> which I think make sense to do this:
> 1. Immediately after returning the syscall
> 2. Immediately before any returning to userspace
>
> (1) has the advantage that the address limit will be forced for the
> exit-path works that we do, preventing those making accesses to kernel
> space.
>
> (2) has the advantage that we'd guarantee that the address limit will
> be forced while userspace is running for the next entry into kernel
> space.
>
> There's actually a third option as well:
>
> (3) forcing the address limit on entry to the kernel from userspace.
>
> This patch implements option 1.
>
>  arch/arm/kernel/entry-common.S | 6 ++++++
>  1 files changed, 6 insertions(+)
>
> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index eb5cd77bf1d8..6a717a2ccb88 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -39,6 +39,8 @@
>  ret_fast_syscall:
>   UNWIND(.fnstart       )
>   UNWIND(.cantunwind    )
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         disable_irq_notrace                     @ disable interrupts
>         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
>         tst     r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
> @@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
>  ret_fast_syscall:
>   UNWIND(.fnstart       )
>   UNWIND(.cantunwind    )
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
>         disable_irq_notrace                     @ disable interrupts
>         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> @@ -262,6 +266,8 @@ ENDPROC(vector_swi)
>         b       ret_slow_syscall
>
>  __sys_trace_return:
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
>         mov     r0, sp
>         bl      syscall_trace_exit
>

That looks pretty great! If I'm reading the macros correctly, this'll
only happen on _actual_ syscall exit, right? So all the crazy OABI
stuff won't suddenly break? e.g.:

asmlinkage long sys_oabi_semtimedop(int semid,
...
                mm_segment_t fs = get_fs();
                set_fs(KERNEL_DS);
                err = sys_semtimedop(semid, sops, nsops, timeout);
                set_fs(fs);
...
        return err;
}

Is there a similarly good place to do this for arm64?

-Kees

-- 
Kees Cook
Pixel Security