revisit arm64 per-task stack canaries

Will Deacon will.deacon at arm.com
Tue Feb 13 10:04:36 PST 2018


Hi Ard,

On Tue, Feb 13, 2018 at 01:56:49PM +0000, Ard Biesheuvel wrote:
> On 13 February 2018 at 12:52, Mark Rutland <mark.rutland at arm.com> wrote:
> > On Tue, Feb 13, 2018 at 12:36:02PM +0000, Ard Biesheuvel wrote:
> >> Instead, I have proposed a proof of concept [0] where the compiler
> >> emits an instruction sequence that loads the canary directly from the
> >> task struct, which is the per-thread data structure maintained by the
> >> kernel. Accessing that can be done safely without any of the
> >> limitations per-CPU variables have. The task struct pointer is kept in
> >> system register sp_el0 while running in the kernel.
> >
> > My major concern here is being tied to using sysregs in a particular
> > way. We might want to fiddle with that in future (e.g. using the
> > platform register as an optimization, or switching to a different sysreg
> > due to architectural extensions).
> >
> 
> Yes. Also, there is a disparity between userland (using tpidr_el0) and
> kernel (using sp_el0), and perhaps it would make more sense to switch
> to tpidr_el0 in the kernel as well. But the general objection remains.

Indeed, and I share Mark's view that we don't want to commit to a specific
sequence here. Ideally, we'd have a way to pass whatever thunk we need to
the compiler and have the freedom to implement it as we see fit (and to
change that implementation at a whim).

> >> The purpose of this email to reach agreement between the various
> >> stakeholders (mainly the arm64 linux maintainers and the ARM GCC
> >> maintainers) on a way to proceed with implementing this in GCC.
> >
> > Would it be possible to have an always inline function to get the
> > canary, which GCC would implicitly fold into functions as necessary?
> >
> > ... then we could have something in a header, like:
> >
> > static __always_inline unsigned long get_task_canary(void)
> > {
> >         return current->canary;
> > }
> >
> > ... which we could change in future as needs change.
> >
> 
> This is a question to the compiler folks, I suppose, but I'd venture a
> guess that this is rather hard. Perhaps a true function call would be
> better if it is done in a way that can be optimized by LTO (this is of
> course assuming that by GCC 9, this is something we are likely to use
> in the kernel)
> 
> An alternative could be to decide to rely on a GCC plugin instead
> (although this would not be my preference). My poc implementation is a
> bit clunky, but I did not spend a lot of time on it. If we can refine
> it to replace the high/lo ref to __stack_chk_guard with something more
> robust, then we remain in control of which register and/or symbol ref
> we use and we don't paint ourselves into a corner.

I'm wary of our ability to maintain a GCC plugin in the kernel source tree.
I would *much* prefer to have proper support in the compiler.

Will



More information about the linux-arm-kernel mailing list