[PATCH] arm64: add prctl(PR_PAC_SET_ENABLED_KEYS)

Mon Aug 24 10:49:11 EDT 2020

On Wed, Aug 19, 2020 at 02:25:45PM -0700, Peter Collingbourne wrote:
> On Wed, Aug 19, 2020 at 3:18 AM Dave Martin <Dave.Martin at arm.com> wrote:
> >
> > On Fri, Jul 31, 2020 at 06:11:52PM -0700, Peter Collingbourne wrote:
> > > This prctl allows the user program to control which PAC keys are enabled
> > > in a particular task. The main reason why this is useful is to enable a
> > > userspace ABI that uses PAC to sign and authenticate function pointers
> > > and other pointers exposed outside of the function, while still allowing
> > > binaries conforming to the ABI to interoperate with legacy binaries that
> > > do not sign or authenticate pointers.
> > >
> > > The idea is that a dynamic loader or early startup code would issue
> > > this prctl very early after establishing that a process may load legacy
> > > binaries, but before executing any PAC instructions.
> >
> > Apologies for the slow response on this, I'd had it on my list for a
> > while...
> >
> > > ---
> > >  .../arm64/pointer-authentication.rst          | 27 +++++++++++++++
> > >  arch/arm64/include/asm/asm_pointer_auth.h     | 19 +++++++++++
> > >  arch/arm64/include/asm/pointer_auth.h         | 10 ++++--
> > >  arch/arm64/include/asm/processor.h            |  5 +++
> > >  arch/arm64/kernel/asm-offsets.c               |  1 +
> > >  arch/arm64/kernel/pointer_auth.c              | 34 +++++++++++++++++++
> > >  include/uapi/linux/prctl.h                    |  3 ++
> > >  kernel/sys.c                                  |  8 +++++
> > >  8 files changed, 105 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/Documentation/arm64/pointer-authentication.rst b/Documentation/arm64/pointer-authentication.rst
> > > index 30b2ab06526b..1f7e064deeb3 100644
> > > --- a/Documentation/arm64/pointer-authentication.rst
> > > +++ b/Documentation/arm64/pointer-authentication.rst
> > > @@ -107,3 +107,30 @@ filter out the Pointer Authentication system key registers from
> > >  KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID
> > >  register. Any attempt to use the Pointer Authentication instructions will
> > >  result in an UNDEFINED exception being injected into the guest.
> > > +
> > > +
> > > +Enabling and disabling keys
> > > +---------------------------
> > > +
> > > +The prctl PR_PAC_SET_ENABLED_KEYS allows the user program to control which
> > > +PAC keys are enabled in a particular task. It takes two arguments, the
> > > +first being a bitmask of PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY
> > > +and PR_PAC_APDBKEY specifying which keys shall be affected by this prctl,
> > > +and the second being a bitmask of the same bits specifying whether the key
> > > +should be enabled or disabled. For example::
> > > +
> > > +  prctl(PR_PAC_SET_ENABLED_KEYS,
> > > +        PR_PAC_APIAKEY | PR_PAC_APIBKEY | PR_PAC_APDAKEY | PR_PAC_APDBKEY,
> > > +        PR_PAC_APIBKEY, 0, 0);
> > > +
> > > +disables all keys except the IB key.
> > > +
> > > +The main reason why this is useful is to enable a userspace ABI that uses PAC
> > > +instructions to sign and authenticate function pointers and other pointers
> > > +exposed outside of the function, while still allowing binaries conforming to
> > > +the ABI to interoperate with legacy binaries that do not sign or authenticate
> > > +pointers.
> >
> > What actually breaks without this?
> >
> > Since the keys are all enabled by default, the only purpose of this
> > prctl seems to be to disable keys.  I'm not sure what this is giving us.
> 
> Yes, the purpose is to disable keys. Let's consider the function
> pointer signing userspace ABI use case. An example is Apple's arm64e
> ABI, and I have a prototype branch of LLVM [0] that implements a
> similar ABI in Linux userspace, based on Apple's implementation of
> their ABI.
> 
> Here's an example of a function that returns a function pointer, and a
> function that calls a function pointer of the same type:
> 
> static void f(void) {}
> 
> void *return_fp(void) {
>   return f;
> }
> 
> void call_fp(void (*p)(void)) {
>   p();
> }
> 
> If I compile this with my prototype compiler I get:
> 
> $ clang --target=aarch64-android   -fptrauth-calls  fptr.c -S -o - -O3
> -march=armv8.3a
> [...]
> return_fp:                              // @return_fp
> // %bb.0:                               // %entry
>         adrp    x16, f
>         add     x16, x16, :lo12:f
>         mov     x17, #16277
>         pacia   x16, x17
>         mov     x0, x16
>         ret
> [...]
> call_fp:                                // @call_fp
> // %bb.0:                               // %entry
>         mov     w1, #16277
>         braa    x0, x1
> [...]
> 
> In this code snippet the function pointer is signed with the IA key
> and discriminator 16277 before being returned. When the function is
> called, the pointer is first authenticated with the same key and
> discriminator.
> 
> Now imagine that this code lives in a shared library used both by
> programs that use the function pointer signing ABI and by legacy
> binaries (i.e. programs that use the existing ABI), and we have a
> legacy binary that calls return_fp. If the legacy binary then calls
> the function pointer returned by return_fp, that code will not
> authenticate the pointer before calling it, it will just use a br or
> blr instruction to call it directly, which will lead to a crash if the
> signature bits are set in the function pointer. In order to prevent
> the crash, we need a way to cause the pacia instruction in return_fp
> to become a no-op when running inside the process hosting the legacy
> binary, so that the signature bits will remain clear and the br or blr
> instruction in the legacy binary will successfully call the function
> f. That can be done by disabling the IA key, which is exactly what
> this prctl() lets us do.
> 
> >
> > > +
> > > +The idea is that a dynamic loader or early startup code would issue this
> > > +prctl very early after establishing that a process may load legacy binaries,
> > > +but before executing any PAC instructions.
> > > diff --git a/arch/arm64/include/asm/asm_pointer_auth.h b/arch/arm64/include/asm/asm_pointer_auth.h
> > > index 52dead2a8640..d121fa5fed5f 100644
> > > --- a/arch/arm64/include/asm/asm_pointer_auth.h
> > > +++ b/arch/arm64/include/asm/asm_pointer_auth.h
> > > @@ -31,6 +31,14 @@ alternative_else_nop_endif
> > >       ldp     \tmp2, \tmp3, [\tmp1, #PTRAUTH_USER_KEY_APDB]
> > >       msr_s   SYS_APDBKEYLO_EL1, \tmp2
> > >       msr_s   SYS_APDBKEYHI_EL1, \tmp3
> > > +
> > > +     ldr     \tmp2, [\tsk, #THREAD_SCTLR_ENXX_MASK]
> > > +     cbz     \tmp2, .Laddr_auth_skip_\@
> >
> > I wonder whether it would make sense to simple store the thread's base
> > SCTLR value (containing the ENxx bits), rather than storing the ENxx
> > bits separately.  There may be reasons outside this snippet why this
> > isn't such a good idea though -- I haven't gone digging so far.
> 
> As far as I know (as I learned [1] from the MTE patch series), it can
> be expensive to access SCTLR_EL1, therefore I arrange to only access
> SCTLR_EL1 in the hopefully-uncommon case where a process has disabled
> keys. Detecting the "process has disabled keys" case is quite simple
> if we only store the disabled keys mask here, not so much if we store
> the full value of SCTLR_EL1.

My thought was that we would still only write SCTLR_EL1 if needed, but
we would do the write-if-needed across the whole register in one go.
This would be easier to extend if we have to twiddle additional
SCTLR_EL1 bits in the future.  If the key enable bits are the only
affected bits for now then we could of course defer this factoring until
later.  (I vaguely remember a similar discussion in the past, but
possibly it was about the pauth keys anyway, rather than something
else.)

In a case like this, we'll still get overheads if there are a mixture of
tasks contending for the CPU, that have different key enable settings.
But I can't see much that we can do about that.  If userspace is mostly
built with the same options (true for the Apple case I guess) then I
guess we shouldn't need SCTLR_EL1 rewrites very often just for this.  In
other environments it may be messier.

> 
> > > +
> > > +     mrs_s   \tmp3, SYS_SCTLR_EL1
> > > +     bic     \tmp3, \tmp3, \tmp2
> > > +     msr_s   SYS_SCTLR_EL1, \tmp3
> > > +
> > >  .Laddr_auth_skip_\@:
> > >  alternative_if ARM64_HAS_GENERIC_AUTH
> > >       ldp     \tmp2, \tmp3, [\tmp1, #PTRAUTH_USER_KEY_APGA]
> > > @@ -45,6 +53,17 @@ alternative_else_nop_endif
> > >       ldp     \tmp2, \tmp3, [\tmp1, #PTRAUTH_KERNEL_KEY_APIA]
> > >       msr_s   SYS_APIAKEYLO_EL1, \tmp2
> > >       msr_s   SYS_APIAKEYHI_EL1, \tmp3
> > > +
> > > +     ldr     \tmp2, [\tsk, #THREAD_SCTLR_ENXX_MASK]
> > > +     cbz     \tmp2, .Lset_sctlr_skip_\@
> > > +
> > > +     mrs_s   \tmp1, SYS_SCTLR_EL1
> > > +     mov     \tmp2, #(SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA)
> >
> > (Nit: harmless but unnecessary ().  # is not an operator as such, just
> > random syntax.  Whatever follows is greedily parsed as an immediate
> > expression.)
> 
> Okay. While looking around in the kernel I noticed that there is a
> mov_q macro that can be used to avoid manually splitting the constant
> into 16-bit chunks, and apparently it doesn't require a #. I'll use it
> in v2.
> 
> > > +     movk    \tmp2, #SCTLR_ELx_ENDB
> >
> > Why do we check THREAD_SCTLR_ENXX_MASK, and then proceed to set all the
> > ENxx bits unconditionally?  I may be missing something here.
> 
> This code is to support the case where we are returning to the kernel
> from a userspace task with keys disabled. The kernel needs at least
> the IA key enabled in order for its own use of reverse-edge PAC to
> work correctly. When returning from a userspace task with no keys
> disabled, the keys enabled bits already have the correct values, so
> there is nothing to be done (and as mentioned above, I avoid touching
> SCTLR_EL1 unless necessary because it is apparently expensive to do
> so). But in a process with keys disabled, we will need to re-enable at
> least IA.
> 
> We may be able to get away with just enabling IA here, but that would
> break the invariant that all keys are enabled inside the kernel, which
> is relied on by the code that decides whether to access SCTLR_EL1 in
> order to disable keys when entering a userspace task.

OK, I think I just confused myself here: we are not setting the key
enables for userspace, but for the kernel, and we only need to do that
if the user task had some keys disabled in the first place.

> 
> > > +     orr     \tmp1, \tmp1, \tmp2
> > > +     msr_s   SYS_SCTLR_EL1, \tmp1
> > > +
> > > +.Lset_sctlr_skip_\@:
> > >       .endm
> > >
> > >       .macro ptrauth_keys_install_kernel_nosync tsk, tmp1, tmp2, tmp3
> > > diff --git a/arch/arm64/include/asm/pointer_auth.h b/arch/arm64/include/asm/pointer_auth.h
> > > index c6b4f0603024..d4c375454a36 100644
> > > --- a/arch/arm64/include/asm/pointer_auth.h
> > > +++ b/arch/arm64/include/asm/pointer_auth.h
> > > @@ -70,14 +70,19 @@ static __always_inline void ptrauth_keys_switch_kernel(struct ptrauth_keys_kerne
> > >  }
> > >
> > >  extern int ptrauth_prctl_reset_keys(struct task_struct *tsk, unsigned long arg);
> > > +extern int ptrauth_prctl_set_enabled_keys(struct task_struct *tsk,
> > > +                                       unsigned long keys,
> > > +                                       unsigned long enabled);
> > >
> > >  static inline unsigned long ptrauth_strip_insn_pac(unsigned long ptr)
> > >  {
> > >       return ptrauth_clear_pac(ptr);
> > >  }
> > >
> > > -#define ptrauth_thread_init_user(tsk)                                        \
> > > -     ptrauth_keys_init_user(&(tsk)->thread.keys_user)
> > > +#define ptrauth_thread_init_user(tsk) do {                           \
> > > +             ptrauth_keys_init_user(&(tsk)->thread.keys_user);       \
> > > +             (tsk)->thread.sctlr_enxx_mask = 0;                      \
> > > +     } while (0)
> > >  #define ptrauth_thread_init_kernel(tsk)                                      \
> > >       ptrauth_keys_init_kernel(&(tsk)->thread.keys_kernel)
> > >  #define ptrauth_thread_switch_kernel(tsk)                            \
> > > @@ -85,6 +90,7 @@ static inline unsigned long ptrauth_strip_insn_pac(unsigned long ptr)
> > >
> > >  #else /* CONFIG_ARM64_PTR_AUTH */
> > >  #define ptrauth_prctl_reset_keys(tsk, arg)   (-EINVAL)
> > > +#define ptrauth_prctl_set_enabled_keys(tsk, keys, enabled)   (-EINVAL)
> > >  #define ptrauth_strip_insn_pac(lr)   (lr)
> > >  #define ptrauth_thread_init_user(tsk)
> > >  #define ptrauth_thread_init_kernel(tsk)
> > > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> > > index 240fe5e5b720..6974d227b01f 100644
> > > --- a/arch/arm64/include/asm/processor.h
> > > +++ b/arch/arm64/include/asm/processor.h
> > > @@ -150,6 +150,7 @@ struct thread_struct {
> > >  #ifdef CONFIG_ARM64_PTR_AUTH
> > >       struct ptrauth_keys_user        keys_user;
> > >       struct ptrauth_keys_kernel      keys_kernel;
> > > +     u64                             sctlr_enxx_mask;
> > >  #endif
> > >  };
> > >
> > > @@ -313,6 +314,10 @@ extern void __init minsigstksz_setup(void);
> > >  /* PR_PAC_RESET_KEYS prctl */
> > >  #define PAC_RESET_KEYS(tsk, arg)     ptrauth_prctl_reset_keys(tsk, arg)
> > >
> > > +/* PR_PAC_SET_ENABLED_KEYS prctl */
> > > +#define PAC_SET_ENABLED_KEYS(tsk, keys, enabled)                             \
> > > +     ptrauth_prctl_set_enabled_keys(tsk, keys, enabled)
> > > +
> > >  #ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
> > >  /* PR_{SET,GET}_TAGGED_ADDR_CTRL prctl */
> > >  long set_tagged_addr_ctrl(unsigned long arg);
> > > diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> > > index 0577e2142284..dac80e16fe35 100644
> > > --- a/arch/arm64/kernel/asm-offsets.c
> > > +++ b/arch/arm64/kernel/asm-offsets.c
> > > @@ -47,6 +47,7 @@ int main(void)
> > >  #ifdef CONFIG_ARM64_PTR_AUTH
> > >    DEFINE(THREAD_KEYS_USER,   offsetof(struct task_struct, thread.keys_user));
> > >    DEFINE(THREAD_KEYS_KERNEL, offsetof(struct task_struct, thread.keys_kernel));
> > > +  DEFINE(THREAD_SCTLR_ENXX_MASK,offsetof(struct task_struct, thread.sctlr_enxx_mask));
> > >  #endif
> > >    BLANK();
> > >    DEFINE(S_X0,                       offsetof(struct pt_regs, regs[0]));
> > > diff --git a/arch/arm64/kernel/pointer_auth.c b/arch/arm64/kernel/pointer_auth.c
> > > index 1e77736a4f66..8c385b7f324a 100644
> > > --- a/arch/arm64/kernel/pointer_auth.c
> > > +++ b/arch/arm64/kernel/pointer_auth.c
> > > @@ -42,3 +42,37 @@ int ptrauth_prctl_reset_keys(struct task_struct *tsk, unsigned long arg)
> > >
> > >       return 0;
> > >  }
> > > +
> > > +static u64 arg_to_enxx_mask(unsigned long arg)
> > > +{
> > > +     u64 sctlr_enxx_mask = 0;
> > > +     if (arg & PR_PAC_APIAKEY)
> > > +             sctlr_enxx_mask |= SCTLR_ELx_ENIA;
> > > +     if (arg & PR_PAC_APIBKEY)
> > > +             sctlr_enxx_mask |= SCTLR_ELx_ENIB;
> > > +     if (arg & PR_PAC_APDAKEY)
> > > +             sctlr_enxx_mask |= SCTLR_ELx_ENDA;
> > > +     if (arg & PR_PAC_APDBKEY)
> > > +             sctlr_enxx_mask |= SCTLR_ELx_ENDB;
> > > +     return sctlr_enxx_mask;
> > > +}
> > > +
> > > +int ptrauth_prctl_set_enabled_keys(struct task_struct *tsk, unsigned long keys,
> > > +                                unsigned long enabled)
> > > +{
> > > +     u64 sctlr_enxx_mask = tsk->thread.sctlr_enxx_mask;
> > > +     unsigned long addr_key_mask = PR_PAC_APIAKEY | PR_PAC_APIBKEY |
> > > +                                   PR_PAC_APDAKEY | PR_PAC_APDBKEY;
> > > +
> > > +     if (!system_supports_address_auth())
> > > +             return -EINVAL;
> > > +
> > > +     if ((keys & ~addr_key_mask) || (enabled & ~keys))
> > > +             return -EINVAL;
> >
> > Should we take the types of authentication supported?
> >
> > I don't recall whether we expose ptrauth to userspace if only
> > instruction authentication or only data authentication is supported.  If
> > so, should we reject attempts to configure unsupported keys here?
> >
> > We should probably try to do a consistent thing both here and in
> > PR_PAC_RESET_KEYS if so.
> 
> As far as I know, there is nothing in the architecture that would
> allow it to only advertise support for I keys or only advertise
> support for D keys. The fields AA64ISAR1_EL1.AP[AI] apply to all four
> keys: DA, DB, IA and IB. Maybe you are thinking of the GA key versus
> the other keys (which is advertised separately via
> AA64ISAR1_EL1.GP[AI])? The architecture appears to provide no way to
> disable the GA key, so I did not include support for it here.

I think I'm confusing myself here.  Yes, support for generic auth is
the (supposedly) architecturally orthogonal to address auth, but data
and instruction address auth are either both supported, or both not
supported -- so your code looks correct.

> 
> > > +
> > > +     sctlr_enxx_mask |= arg_to_enxx_mask(keys);
> > > +     sctlr_enxx_mask &= ~arg_to_enxx_mask(enabled);
> > > +
> > > +     tsk->thread.sctlr_enxx_mask = sctlr_enxx_mask;
> > > +     return 0;
> >
> > Do we need a way to query the enabled keys?
> >
> > We could either have a _GET_ prctl (the common approach), or have this
> > prctl return the mask of enabled keys (avoids the extra prctl, but
> > weirder).
> >
> > As above, we might
> >
> > Things like CRIU may need a GET in order to save/restore this setting
> > properly.
> 
> Maybe it makes sense for there to be a GET prctl() to support CRIU.
> But it would need to be defined carefully because CRIU would
> presumably need to know what value to pass as the "keys" argument when
> it calls SET to restore the state. It can't just hardcode a value
> because that may harm extensibility if new keys are added.
> 
> If we require the kernel to start processes with all keys enabled
> (including any keys that we may introduce in the future), then it
> seems like if CRIU knows which keys were disabled, then it can restore
> the state by issuing a syscall like this:
> 
> prctl(PR_PAC_SET_ENABLED_KEYS, disabled_keys_mask, 0)
> 
> which would imply that instead of providing PR_PAC_GET_ENABLED_KEYS we
> instead provide PR_PAC_GET_DISABLED_KEYS, which CRIU would call when
> saving the state to prepare the disabled_keys_mask argument passed
> here. Then for consistency we can make the SET prctl() be
> PR_PAC_SET_DISABLED_KEYS, and CRIU can then do:
> 
> prctl(PR_PAC_SET_DISABLED_KEYS, disabled_keys_mask, disabled_keys_mask)
> 
> Does that make sense?

Possibly, though it's nicer if the GET returns the same value suppiled
to the SET, rather than the complement of it.

If SET ignores set bits in arg3 that don't correspond to set bits in the
mask arg2, then

	u64 current_keys = prctl(PR_PAC_GET_ENABLED_KEYS);

	prctl(PR_PAC_SET_ENABLED_KEYS, ~0UL, current_keys);

should work.

There's a final option, which is to expose this config through ptrace
instead for save/restore purposes.  From previous discussions with the
CRIU folks, I recall that they are trying to move away from doing setup
from within the new process where possible.  

There's no reason not to have both though -- there's precedent for that,
such as for PR_SVE_{GET,SET}_VL and the NT_ARM_SVE regset.  MTE may move
in a similar direction too IIUC.

Having a GET remains useful for in-process debugging and diagnostics,
and it's extremely straightforward to add in the kernel.  So from my
side I'd vote to have it anyway...

Cheers
---Dave