[PATCH v8] arm64: Expose FAR_EL1 tag bits in siginfo

Dave Martin Dave.Martin at arm.com
Tue Jul 7 10:19:45 EDT 2020


On Mon, Jul 06, 2020 at 12:20:33PM -0700, Peter Collingbourne wrote:
> On Mon, Jul 6, 2020 at 9:41 AM Dave Martin <Dave.Martin at arm.com> wrote:
> >
> > On Wed, Jun 24, 2020 at 12:51:43PM -0700, Peter Collingbourne wrote:
> > > On Wed, Jun 24, 2020 at 10:12 AM Dave Martin <Dave.Martin at arm.com> wrote:
> > > >
> > > > On Wed, Jun 24, 2020 at 09:51:49AM -0700, Peter Collingbourne wrote:
> > > > > On Wed, Jun 24, 2020 at 2:28 AM Dave Martin <Dave.Martin at arm.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 23, 2020 at 05:40:08PM -0700, Peter Collingbourne wrote:
> > > > > > > On Tue, Jun 23, 2020 at 10:52 AM Eric W. Biederman
> > > > > > > <ebiederm at xmission.com> wrote:
> > > > > > > >
> > > > > > > > Dave Martin <Dave.Martin at arm.com> writes:
> > > > > > > >
> > > > > > > > > On Tue, Jun 23, 2020 at 07:54:59AM -0500, Eric W. Biederman wrote:
> > > > > > > > >> Peter Collingbourne <pcc at google.com> writes:
> > > > > > > > >>
> > > > > > > > >> > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > > > > > > > >> > index 47f651df781c..a8380a2b6361 100644
> > > > > > > > >> > --- a/arch/arm64/kernel/traps.c
> > > > > > > > >> > +++ b/arch/arm64/kernel/traps.c
> > > > > > > > >> > @@ -235,20 +235,41 @@ static void arm64_show_signal(int signo, const char *str)
> > > > > > > > >> >  }
> > > > > > > > >> >
> > > > > > > > >> >  void arm64_force_sig_fault(int signo, int code, void __user *addr,
> > > > > > > > >> > +                     unsigned long far, unsigned char far_tb_mask,
> > > > > > > > >> >                       const char *str)
> > > > > > > > >> >  {
> > > > > > > > >> >    arm64_show_signal(signo, str);
> > > > > > > > >> > -  if (signo == SIGKILL)
> > > > > > > > >> > +  if (signo == SIGKILL) {
> > > > > > > > >> >            force_sig(SIGKILL);
> > > > > > > > >> > -  else
> > > > > > > > >> > -          force_sig_fault(signo, code, addr);
> > > > > > > > >> > +  } else {
> > > > > > > > >> > +          struct kernel_siginfo info;
> > > > > > > > >> > +          clear_siginfo(&info);
> > > > > > > > >> > +          info.si_signo = signo;
> > > > > > > > >> > +          info.si_errno = 0;
> > > > > > > > >> > +          info.si_code = code;
> > > > > > > > >> > +          info.si_addr = addr;
> > > > > > > > >> > +          info.si_addr_top_byte = (far >> 56) & far_tb_mask;
> > > > > > > > >> > +          info.si_addr_top_byte_mask = far_tb_mask;
> > > > > > > > >> > +          force_sig_info(&info);
> > > > > > > > >> > +  }
> > > > > > > > >> >  }
> > > > > > > > >> >
> > > > > > > > >> >  void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
> > > > > > > > >> > -                      const char *str)
> > > > > > > > >> > +                      unsigned long far, const char *str)
> > > > > > > > >> >  {
> > > > > > > > >> > +  struct kernel_siginfo info;
> > > > > > > > >> > +
> > > > > > > > >> >    arm64_show_signal(SIGBUS, str);
> > > > > > > > >> > -  force_sig_mceerr(code, addr, lsb);
> > > > > > > > >> > +
> > > > > > > > >> > +  clear_siginfo(&info);
> > > > > > > > >> > +  info.si_signo = SIGBUS;
> > > > > > > > >> > +  info.si_errno = 0;
> > > > > > > > >> > +  info.si_code = code;
> > > > > > > > >> > +  info.si_addr = addr;
> > > > > > > > >> > +  info.si_addr_lsb = lsb;
> > > > > > > > >> > +  info.si_addr_top_byte = far >> 56;
> > > > > > > > >> > +  info.si_addr_top_byte_mask = 0xff;
> > > > > > > > >> > +  force_sig_info(&info);
> > > > > > > > >> >  }
> > > > > > > > >>
> > > > > > > > >> I have a real problem with this construction.  force_sig_info is not an
> > > > > > > > >> interface that should be used for anything except to define a wrapper
> > > > > > > > >> that takes it's parameters.
> > > > > > > > >
> > > > > > > > > Can you elaborate?  How would you do this king of thing.
> > > > > > > >
> > > > > > > > There are no other uses of force_sig_info in architecture code.
> > > > > > > >
> > > > > > > > I just removed them _all_ because they were almost all broken.
> > > > > > > > In fact your mcerr case is broken because it uses two different
> > > > > > > > union members simultantiously.
> > > > > > >
> > > > > > > Is that really broken? I thought that the Linux kernel deliberately
> > > > > > > didn't care about strict aliasing rules (the top-level Makefile passes
> > > > > > > -fno-strict-aliasing) so I thought that it was valid in "Linux kernel
> > > > > > > C" even though from a standards point of view it is invalid. (That
> > > > > > > being said, this is probably moot with my proposed changes below
> > > > > > > though.)
> > > > > >
> > > > > > I have a feeling that -fno-strict-aliasing only allows you to _read_ a
> > > > > > different union member from the one previously written.
> > > > > >
> > > > > > Writing a different member from the last one written can still splatter
> > > > > > on the other members IIUC.
> > > > > >
> > > > > > It would be better to keep things separate rather than risk
> > > > > > incorrectness just to save a few bytes.
> > > > > >
> > > > > > IMHO -fno-strict-aliasing is no excuse for gratuitous type-punning.
> > > > > >
> > > > > > > > So I am looking for something like force_sig_mcerr or force_sig_fault
> > > > > > > > that includes your new information that then calls force_sig_info.
> > > > > > > >
> > > > > > > > I know of no other way to safely use the siginfo struct.
> > > > > > >
> > > > > > > So you want something like:
> > > > > > >
> > > > > > > int force_sig_fault_with_ignored_bits(int signo, int code, void __user
> > > > > > > *addr, uintptr_t addr_ignored, uintptr_t addr_ignored_mask);
> > > > > > > int force_sig_mceerr_with_ignored_bits(int code, void __user *addr,
> > > > > > > short lsb, uintptr_t addr_ignored, uintptr_t addr_ignored_mask);
> > > > > > >
> > > > > > > in kernel/signal.c and the code in arch/arm64 would call that?
> > > > > > >
> > > > > > > > > AIUI we absolutely need a forced signal here, we need to supply
> > > > > > > > > metadata, and we don't have to open-code all that at every relevant
> > > > > > > > > signal generation site...
> > > > > > > > >
> > > > > > > > >> It is not clear to me that if you have adapted siginfo_layout.
> > > > > > > > >
> > > > > > > > > Garbled sentence?
> > > > > > > >
> > > > > > > > Looks like.  One of the pieces of code that needs to change
> > > > > > > > when siginfo gets updated is siginfo_layout so that the structure
> > > > > > > > can be properly decoded and made sense of.
> > > > > > > >
> > > > > > > > I am not seeing anything like that.
> > > > > > >
> > > > > > > Okay, this has to do with copying between the compat and non-compat
> > > > > > > versions of the struct? Sure, I can update that, although the code
> > > > > > > would be basically non-functional on arm64 because TBI isn't supported
> > > > > > > on 32-bit ARM.
> > > > > > >
> > > > > > > > >> > diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
> > > > > > > > >> > index cb3d6c267181..6dd82373eb2d 100644
> > > > > > > > >> > --- a/include/uapi/asm-generic/siginfo.h
> > > > > > > > >> > +++ b/include/uapi/asm-generic/siginfo.h
> > > > > > > > >> > @@ -91,6 +91,14 @@ union __sifields {
> > > > > > > > >> >                            char _dummy_pkey[__ADDR_BND_PKEY_PAD];
> > > > > > > > >> >                            __u32 _pkey;
> > > > > > > > >> >                    } _addr_pkey;
> > > > > > > > >> > +#ifdef __aarch64__
> > > > > > > > >> > +                  /* used with all si_codes */
> > > > > > > > >> > +                  struct {
> > > > > > > > >> > +                          short _dummy_top_byte;
> > > > > > > > >
> > > > > > > > > ^ What's this for?  I don't have Eric's insight here.
> > > > > > >
> > > > > > > We would need a short's worth of padding in order to prevent the
> > > > > > > fields from occupying the same address as si_addr_lsb.
> > > > > > >
> > > > > > > > >
> > > > > > > > >> > +                          unsigned char _top_byte;
> > > > > > > > >> > +                          unsigned char _top_byte_mask;
> > > > > > > > >> > +                  } _addr_top_byte;
> > > > > > > > >> > +#endif
> > > > > > > > >> >            };
> > > > > > > > >> >    } _sigfault;
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >> Why the _dummy_top_byte?  Oh I see it should be spelled "short _addr_lsb;".
> > > > > > > > >>
> > > > > > > > >> Please remove the "#ifdef __aarch64__".  If at all possible we want to
> > > > > > > > >> design this so any other architecture who has this challenge can use the
> > > > > > > > >> code.  The kind of code does not get enough attention/maintenance if it
> > > > > > > > >> is built for a single architecture.
> > > > > > >
> > > > > > > Seems reasonable. I was recently made aware that RISC-V was
> > > > > > > considering a similar feature:
> > > > > > > https://lists.riscv.org/g/tech-tee/topic/risc_v_tbi_proposal/72855478
> > > > > > > I would have opted to expand this to other architectures on an
> > > > > > > as-needed basis, but I'd also be fine with having it on all
> > > > > > > architectures from the start.
> > > > > > >
> > > > > > > If we make this arch-independent, we have an additional concern, which
> > > > > > > is "what if some future architecture wants more than one byte here?"
> > > > > > > For example, an architecture may have a "top-two-bytes-ignore"
> > > > > > > feature, which would imply two-byte (misnamed) "si_addr_top_byte" and
> > > > > > > "si_addr_top_byte_mask" fields. And the RISC-V proposal potentially
> > > > > > > implies many more ignored bits (see slide 13 of the presentation). The
> > > > > > > maximum size that these fields can possibly be is the size of a
> > > > > > > pointer, and with that there wouldn't be enough room in the padding at
> > > > > > > this point to accommodate the new fields.
> > > > > > >
> > > > > > > That basically implies your earlier suggestion of adding a union
> > > > > > > member here to accommodate future expansion of the union, and adding
> > > > > > > the new fields after the union. I'm happy to make that change, with
> > > > > > > the fields renamed "si_addr_ignored" and "si_addr_ignored_mask".
> > > > > >
> > > > > > I think what we need here is basically a flags word.
> > > > > >
> > > > > > So long as we keep a flag spare to indicate the existence of a further
> > > > > > flags word, we can extend as needed.
> > > > > >
> > > > > > How the existence of the first flags words is detected is another
> > > > > > problem.  If it only applies for newly-defined si_code values, then
> > > > > > I guess si_code may be sufficient.
> > > > >
> > > > > Existing kernels will zero-initialize unused regions of the siginfo
> > > > > data structure. The zero-initialization of the padding at the end of
> > > > > the struct is done by the clear_user call here:
> > > > > https://github.com/torvalds/linux/blob/3e08a95294a4fb3702bb3d35ed08028433c37fe6/kernel/signal.c#L3193
> > > > >
> > > > > and the zero-initialization of the padding between fields and unused
> > > > > union members is done by the clear_siginfo function which the kernel
> > > > > calls when initializing the data structure:
> > > > > https://github.com/torvalds/linux/blob/3e08a95294a4fb3702bb3d35ed08028433c37fe6/include/linux/signal.h#L20
> > > > >
> > > > > Therefore, a flag word value of 0 may be used to detect a lack of
> > > > > support for flagged fields.
> > > >
> > > > It's not enough that we do this today.  We would have had to do it back
> > > > to the dawn of time (though in the arm64 case I guess we just need to go
> > > > back to when the arch/arm64 was merged).
> > > >
> > > > v2.6.12:kernel/signal.c:copy_siginfo_to_user() suggests that this wasn't
> > > > always the case, so unused parts of siginfo could be full of old junk
> > > > from the user stack, if the kernel is sufficiently old.
> > > >
> > > > If we're trying to do something generic that makes sense on all arches,
> > > > this matters.  I may have misunderstood something about the code though.
> > >
> > > Hmm, I think you're right. The current behavior was introduced by
> > > commit c999b933faa5e281e3af2e110eccaf91698b0a81 which was first
> > > released in 4.18. So if an application wants to be compatible with
> > > pre-4.18 kernels then there would need to be some other way to
> > > indicate that the fields are valid. Probably the simplest way would be
> > > to have the application issue a uname(2) syscall and check the kernel
> > > version before reading these fields. I have a couple of other ideas
> > > that don't rely on version detection, if we'd prefer to avoid that.
> > > (They are somewhat ugly, but our hand is forced by backwards
> > > compatibility.)
> > >
> > > One idea is to re-purpose the si_errno field as a flags field for
> > > certain signal numbers. I checked a few kernel releases going back to
> > > 2.6.18 and it looks like the field is set to 0 except in the following
> > > circumstances:
> > > - sending a hardware breakpoint (SIGTRAP/TRAP_HWBKPT)
> > > - seccomp failures (SIGSYS/SYS_SECCOMP)
> > > - user-defined signal via kill_pid_usb_asyncio
> > > - SIGSWI in 3.18 and before (code since removed)
> > >
> > > It is also set to EFAULT for certain SIGSEGV/SEGV_MAPERR signals on
> > > powerpc since commit c96c4436aba4c12f1f48369f2f90bc43e12fe36c, which
> > > is currently unreleased. So if we wanted to go this route for SIGSEGV
> > > we would need to stop the kernel from setting si_errno to EFAULT for
> > > this signal before the 5.8 release.
> > >
> > > Another idea was to have userspace set a flag in sa_flags when
> > > registering a signal handler meaning "this signal handler requires
> > > unknown siginfo fields to be zeroed", and have existing kernels reject
> > > the syscall due to an unknown flag being set, but unfortunately this
> > > won't work because existing kernels do not reject sigaction syscalls
> > > with unknown flags set in sa_flags. A perhaps more radical idea in
> > > this vein would be to claim some of the upper bits of the signal
> > > number as flags that will cause the syscall to be rejected if set and
> > > unknown to the kernel. Existing kernels (going back to at least
> > > 2.6.18) contain this code in do_sigaction:
> > >
> > >         if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig)))
> > >                 return -EINVAL;
> > >
> > > and vald_signal is defined as:
> > >
> > > static inline int valid_signal(unsigned long sig)
> > > {
> > >         return sig <= _NSIG ? 1 : 0;
> > > }
> > >
> > > All architectures define _NSIG as a value <= 128, so they will reject
> > > a signal number with any of bits 8-31 set. This means that we can use
> > > any of those bits for mandatory flags. Most likely we could use bit 30
> > > (expanding down as necessary), as it keeps the signal number positive
> > > and permits future expansion of the signal number range.
> >
> > Does the signal core code actually gurantee to zero the unused fields?
> > Unless the fields are poked in by hand this is fraught with subtlelies,
> > especially when unions are involved.  (I'm sure the code tries to do it,
> > but I've not eyeballed it in detail...)
> 
> It memsets the siginfo structure before setting the fields and sending
> the signal (grep for clear_siginfo which is just a memset; you should
> find a call before all callers of force_sig_info). Memset is the right
> approach here since unlike setting fields by hand it clears padding
> which could lead to information leaks from the kernel. IIUC this is
> the reason why Eric wants all of the signals to be raised via wrappers
> in kernel/signal.c instead of via force_sig_info directly (to make
> this aspect easier to audit).

My impression was that the reason for this model is partly to ensure
that siginfo fields are populated more consistently.  When it was all
down to the individual callers, inconsistencies creeped in.

With regard to memset(), this is not a complete defence against data
leakage.  Assigning to a struct member can set any or all padding in
the struct to random garbage (consider write-combining of neighboring
member writes into a single larger accesses in asm for example).  The
only way to avoid this is to ensure that the struct is 100%
padding-free, and that each member of a union is the same size.  A
quick clance at <uapi/asm-generic/siginfo.h> confirms that this is not
the case.

This might need to be looked at separately.

But it does mean, strictly speaking, that we can't reliably add new
fields anywhere that there was previously padding: assigning to
neighboring members can still fill those with garbage after the
memset().

> > Using unused bits in the signal number to turn on new functionality
> > feels risky.  As currently specified, this is just a number.  Since
> > today a successful sigaction(n ...) guarantees that n is a valid signal
> > number, reasonable code like the following would trigger a buffer
> > overrun if we start trying to encode anything else in there:
> >
> > struct sigaction actions[NSIG];
> >
> > int do_something( ... )
> > {
> >         ...
> >
> >         if (!sigaction(n, sa, ...)) {
> >                 actions[n] = *sa;
> >                 return 0;
> >         }
> >
> >         ...
> > }
> 
> I imagine the bit in the signal number being set by the direct caller
> to sigaction, and we could specifically recommend that calling
> pattern. In that case, your "n" wouldn't have the bit set in it. It

I can imagine this too, but that doesn't mean that software does it.

If the above kind of thing exists in a framework or library somewhere,
we could get problems.  Similarly, a pre-existing LD_PRELOAD framework
that provides a wrapper for sigaction may now go wrong even if your
pattern is followed -- i.e., the caller thinks it's calling sigaction
directly but in fact it isn't.

> could only appear in newly-written code that doesn't follow our
> recommendations, and there are already plenty of much more likely ways
> to cause buffer overflows in C code that doesn't follow
> recommendations anyway. (And even if such a buffer overflow occurred,
> it would very likely be caught early in development by the MMU due to
> the magnitude of the number 1<<30.)

Choosing the bit value is hard.  If shitfing it overflows, this can
trigger random undefined behaviour in the compiler in addition to (or
perhaps instead of) an out-of-bounds access or segfault.

If shifting it doesn't overflow, we might still fall into a valid
mapping, though I'd agree a segfault is more likely.

> 
> > I think it would be cleaner for to add a single flag field that can be
> > used for detecting other extensions, and request it via a new sa_flags
> > bit.  This removes the need for sematically useless zeroing of unused
> > fields (though for hygiene and backwards compatibility reasons we would
> > probably want to carry on zeroing them anyway).
> >
> > I can see no simpler way to add supplementary siginfo fields for
> > existing si_codes.  For si_codes that didn't exist before the zeroing
> > came in we could still detect optional si_code-specific fields via
> > zeroing, but it seems messary to have two ways of detecting extensions.
> 
> That would certainly be cleaner if it worked, but that would only be
> the case if old kernels rejected unknown bits in sa_flags, and
> unfortunately they don't. With the bit in the signal number, the "old

Hmm, that is a problem I wasn't aware of.

> kernels reject" behavior admits relatively straightforward usage code:
> 
> void set_segv_handler(void) {
>   struct sigaction sa;
>   sa.sa_sigaction = handle_segv;
>   sa.sa_flags = SA_SIGINFO;
>   if (sigaction(SIGSEGV | SF_CLEAR_UNKNOWN_FIELDS, &sa, 0) < 0) { //
> succeeds in new kernels, fails in old kernels
>     sa.sa_sigaction = clear_fields_and_handle_segv;
>     if (sigaction(SIGSEGV, &sa, 0) < 0) // succeeds in old kernels
>       perror("sigaction");
>   }
> }
> 
> void clear_fields_and_handle_segv(int signum, siginfo_t *sa, void *ctx) {
>   sa->si_future_field = 0;
>   handle_segv(signum, sa, ctx);
> }
> 
> void handle_segv(int signum, siginfo_t *sa, void *ctx) {
>   // At this point, si_future_field will have the value 0 in old
> kernels and the kernel-supplied value in new kernels.
> }
> 
> Imagine if we moved the flag SF_CLEAR_UNKNOWN_FIELDS from the signal
> number to sa_flags. In that case, the first sigaction would succeed in
> old kernels so handle_segv wouldn't know whether it can safely read
> from si_future_field. With the sa_flags approach, you would need
> kernel version number checking via uname before setting the flag in
> sa_flags, and at that point why even have the flag in sa_flags at all
> since you could just have the signal handler conditionally read from
> si_future_field based on the uname?

Software setting SA_SIFLAGS (or whatever) is new by definition, since
it would be using a new #define.  So it might be reasonable to put the
burden on that software to verify that the flag was really accepted by
the kernel, by reading it back.

Unfortunately, even relatively recent kernels blindly store sa_flags
in the kernel without validating it, and so it looks like duff flags
can be read back out via a sigaction() call.  Dang.


Perhaps a new frontend syscall could be added.  A new libc that knows
about this "sigaction2" could use it and mask off problem bits from
sa_flags in its sigaction() wrapper before calling sigaction2.  An old
libc would call the old sigaction syscall, where we would ignore these
new sa_flags bits as before.

This may not be a popular approach though, and software wouldn't be able
to use our new features until libc is updated to match.

If we go down this route, it may provide additional opportunities to fix
annoying defects in the old interface.


> Note that the same applies to a flag indicating the availability of a
> si_flags field in sigaction (just
> s/SF_CLEAR_UNKNOWN_FIELDS/SF_HAS_SI_FLAGS/ and
> s/si_future_field/si_flags/ in the usage code above). In terms of
> SF_CLEAR_UNKNOWN_FIELDS versus SF_HAS_SI_FLAGS I'd be fine either way.
> 
> Another thought that occurred to me is that we may consider
> generalizing this a step further and introducing a single flag bit in
> the signal number that means "reject unknown flags in sa_flags". This
> would mean that we wouldn't need to add any more flag bits to the
> signal number in the future, thus limiting this signal number hack to
> a single bit; all future mandatory behavior changes could just be put
> behind a flag in sa_flags and userspace code would easily be able to
> detect missing support for a flag and fall back if necessary. In our
> case, this would imply usage code like this:
> 
> void set_segv_handler(void) {
>   struct sigaction sa;
>   sa.sa_sigaction = handle_segv;
>   sa.sa_flags = SA_SIGINFO | SA_CLEAR_UNKNOWN_FIELDS;
>   // Succeeds in kernels with SA_CLEAR_UNKNOWN_FIELDS support.
>   // Fails in kernels with SF_CHECK_SA_FLAGS support but no
> SA_CLEAR_UNKNOWN_FIELDS support (because of the unknown flags check).
>   // Fails in kernels without SF_CHECK_SA_FLAGS support (because of
> the bounds check on the signal number).
>   if (sigaction(SIGSEGV | SF_CHECK_SA_FLAGS, &sa, 0) < 0) {
>     sa.sa_sigaction = clear_fields_and_handle_segv;
>     sa.sa_flags = SA_SIGINFO;
>     // Succeeds in old kernels, no need to use SF_CHECK_SA_FLAGS since
> we're using sa_flags from the beginning of time.
>     if (sigaction(SIGSEGV, &sa, 0) < 0)
>       perror("sigaction");
>   }
> }

As with the other options this could work, but looks like it could
break the ABI due to violating the original semantics for the signal
number argument.  Perhaps I'm being too paranoid.

Cheers
---Dave



More information about the linux-arm-kernel mailing list