[PATCH v8] arm64: Expose FAR_EL1 tag bits in siginfo

Mon Aug 17 23:16:42 EDT 2020

On Tue, Jul 14, 2020 at 10:36 AM Dave Martin <Dave.Martin at arm.com> wrote:
>
> On Mon, Jul 13, 2020 at 01:50:30PM -0700, Peter Collingbourne wrote:
> > On Mon, Jul 13, 2020 at 6:24 AM Dave Martin <Dave.Martin at arm.com> wrote:
> > >
> > > On Wed, Jul 08, 2020 at 03:21:13PM -0700, Peter Collingbourne wrote:
> > > > On Wed, Jul 8, 2020 at 6:58 AM Dave Martin <Dave.Martin at arm.com> wrote:
> > > > >
> > > > > On Wed, Jul 08, 2020 at 12:00:22PM +0100, Dave Martin wrote:
> > > > > > On Tue, Jul 07, 2020 at 12:07:09PM -0700, Peter Collingbourne wrote:
> > > > > > > On Tue, Jul 7, 2020 at 7:19 AM Dave Martin <Dave.Martin at arm.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jul 06, 2020 at 12:20:33PM -0700, Peter Collingbourne wrote:
> > > > > > > > > On Mon, Jul 6, 2020 at 9:41 AM Dave Martin <Dave.Martin at arm.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 24, 2020 at 12:51:43PM -0700, Peter Collingbourne wrote:
> > > > > > > > > > > On Wed, Jun 24, 2020 at 10:12 AM Dave Martin <Dave.Martin at arm.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jun 24, 2020 at 09:51:49AM -0700, Peter Collingbourne wrote:
> > > > > > > > > > > > > On Wed, Jun 24, 2020 at 2:28 AM Dave Martin <Dave.Martin at arm.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Jun 23, 2020 at 05:40:08PM -0700, Peter Collingbourne wrote:
> > > > > > > > > > > > > > > On Tue, Jun 23, 2020 at 10:52 AM Eric W. Biederman
> > > > > > > > > > > > > > > <ebiederm at xmission.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Dave Martin <Dave.Martin at arm.com> writes:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Jun 23, 2020 at 07:54:59AM -0500, Eric W. Biederman wrote:
> > > > > > > > > > > > > > > > >> Peter Collingbourne <pcc at google.com> writes:
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > > > > > > > > > > > > > > > >> > index 47f651df781c..a8380a2b6361 100644
> > > > > > > > > > > > > > > > >> > --- a/arch/arm64/kernel/traps.c
> > > > > > > > > > > > > > > > >> > +++ b/arch/arm64/kernel/traps.c
> > > > > > > > > > > > > > > > >> > @@ -235,20 +235,41 @@ static void arm64_show_signal(int signo, const char *str)
> > > > > > > > > > > > > > > > >> >  }
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> >  void arm64_force_sig_fault(int signo, int code, void __user *addr,
> > > > > > > > > > > > > > > > >> > +                     unsigned long far, unsigned char far_tb_mask,
> > > > > > > > > > > > > > > > >> >                       const char *str)
> > > > > > > > > > > > > > > > >> >  {
> > > > > > > > > > > > > > > > >> >    arm64_show_signal(signo, str);
> > > > > > > > > > > > > > > > >> > -  if (signo == SIGKILL)
> > > > > > > > > > > > > > > > >> > +  if (signo == SIGKILL) {
> > > > > > > > > > > > > > > > >> >            force_sig(SIGKILL);
> > > > > > > > > > > > > > > > >> > -  else
> > > > > > > > > > > > > > > > >> > -          force_sig_fault(signo, code, addr);
> > > > > > > > > > > > > > > > >> > +  } else {
> > > > > > > > > > > > > > > > >> > +          struct kernel_siginfo info;
> > > > > > > > > > > > > > > > >> > +          clear_siginfo(&info);
> > > > > > > > > > > > > > > > >> > +          info.si_signo = signo;
> > > > > > > > > > > > > > > > >> > +          info.si_errno = 0;
> > > > > > > > > > > > > > > > >> > +          info.si_code = code;
> > > > > > > > > > > > > > > > >> > +          info.si_addr = addr;
> > > > > > > > > > > > > > > > >> > +          info.si_addr_top_byte = (far >> 56) & far_tb_mask;
> > > > > > > > > > > > > > > > >> > +          info.si_addr_top_byte_mask = far_tb_mask;
> > > > > > > > > > > > > > > > >> > +          force_sig_info(&info);
> > > > > > > > > > > > > > > > >> > +  }
> > > > > > > > > > > > > > > > >> >  }
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> >  void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
> > > > > > > > > > > > > > > > >> > -                      const char *str)
> > > > > > > > > > > > > > > > >> > +                      unsigned long far, const char *str)
> > > > > > > > > > > > > > > > >> >  {
> > > > > > > > > > > > > > > > >> > +  struct kernel_siginfo info;
> > > > > > > > > > > > > > > > >> > +
> > > > > > > > > > > > > > > > >> >    arm64_show_signal(SIGBUS, str);
> > > > > > > > > > > > > > > > >> > -  force_sig_mceerr(code, addr, lsb);
> > > > > > > > > > > > > > > > >> > +
> > > > > > > > > > > > > > > > >> > +  clear_siginfo(&info);
> > > > > > > > > > > > > > > > >> > +  info.si_signo = SIGBUS;
> > > > > > > > > > > > > > > > >> > +  info.si_errno = 0;
> > > > > > > > > > > > > > > > >> > +  info.si_code = code;
> > > > > > > > > > > > > > > > >> > +  info.si_addr = addr;
> > > > > > > > > > > > > > > > >> > +  info.si_addr_lsb = lsb;
> > > > > > > > > > > > > > > > >> > +  info.si_addr_top_byte = far >> 56;
> > > > > > > > > > > > > > > > >> > +  info.si_addr_top_byte_mask = 0xff;
> > > > > > > > > > > > > > > > >> > +  force_sig_info(&info);
> > > > > > > > > > > > > > > > >> >  }
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> I have a real problem with this construction.  force_sig_info is not an
> > > > > > > > > > > > > > > > >> interface that should be used for anything except to define a wrapper
> > > > > > > > > > > > > > > > >> that takes it's parameters.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can you elaborate?  How would you do this king of thing.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > There are no other uses of force_sig_info in architecture code.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I just removed them _all_ because they were almost all broken.
> > > > > > > > > > > > > > > > In fact your mcerr case is broken because it uses two different
> > > > > > > > > > > > > > > > union members simultantiously.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Is that really broken? I thought that the Linux kernel deliberately
> > > > > > > > > > > > > > > didn't care about strict aliasing rules (the top-level Makefile passes
> > > > > > > > > > > > > > > -fno-strict-aliasing) so I thought that it was valid in "Linux kernel
> > > > > > > > > > > > > > > C" even though from a standards point of view it is invalid. (That
> > > > > > > > > > > > > > > being said, this is probably moot with my proposed changes below
> > > > > > > > > > > > > > > though.)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have a feeling that -fno-strict-aliasing only allows you to _read_ a
> > > > > > > > > > > > > > different union member from the one previously written.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Writing a different member from the last one written can still splatter
> > > > > > > > > > > > > > on the other members IIUC.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It would be better to keep things separate rather than risk
> > > > > > > > > > > > > > incorrectness just to save a few bytes.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > IMHO -fno-strict-aliasing is no excuse for gratuitous type-punning.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So I am looking for something like force_sig_mcerr or force_sig_fault
> > > > > > > > > > > > > > > > that includes your new information that then calls force_sig_info.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I know of no other way to safely use the siginfo struct.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So you want something like:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > int force_sig_fault_with_ignored_bits(int signo, int code, void __user
> > > > > > > > > > > > > > > *addr, uintptr_t addr_ignored, uintptr_t addr_ignored_mask);
> > > > > > > > > > > > > > > int force_sig_mceerr_with_ignored_bits(int code, void __user *addr,
> > > > > > > > > > > > > > > short lsb, uintptr_t addr_ignored, uintptr_t addr_ignored_mask);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > in kernel/signal.c and the code in arch/arm64 would call that?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > AIUI we absolutely need a forced signal here, we need to supply
> > > > > > > > > > > > > > > > > metadata, and we don't have to open-code all that at every relevant
> > > > > > > > > > > > > > > > > signal generation site...
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >> It is not clear to me that if you have adapted siginfo_layout.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Garbled sentence?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Looks like.  One of the pieces of code that needs to change
> > > > > > > > > > > > > > > > when siginfo gets updated is siginfo_layout so that the structure
> > > > > > > > > > > > > > > > can be properly decoded and made sense of.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I am not seeing anything like that.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Okay, this has to do with copying between the compat and non-compat
> > > > > > > > > > > > > > > versions of the struct? Sure, I can update that, although the code
> > > > > > > > > > > > > > > would be basically non-functional on arm64 because TBI isn't supported
> > > > > > > > > > > > > > > on 32-bit ARM.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >> > diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
> > > > > > > > > > > > > > > > >> > index cb3d6c267181..6dd82373eb2d 100644
> > > > > > > > > > > > > > > > >> > --- a/include/uapi/asm-generic/siginfo.h
> > > > > > > > > > > > > > > > >> > +++ b/include/uapi/asm-generic/siginfo.h
> > > > > > > > > > > > > > > > >> > @@ -91,6 +91,14 @@ union __sifields {
> > > > > > > > > > > > > > > > >> >                            char _dummy_pkey[__ADDR_BND_PKEY_PAD];
> > > > > > > > > > > > > > > > >> >                            __u32 _pkey;
> > > > > > > > > > > > > > > > >> >                    } _addr_pkey;
> > > > > > > > > > > > > > > > >> > +#ifdef __aarch64__
> > > > > > > > > > > > > > > > >> > +                  /* used with all si_codes */
> > > > > > > > > > > > > > > > >> > +                  struct {
> > > > > > > > > > > > > > > > >> > +                          short _dummy_top_byte;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ^ What's this for?  I don't have Eric's insight here.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We would need a short's worth of padding in order to prevent the
> > > > > > > > > > > > > > > fields from occupying the same address as si_addr_lsb.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >> > +                          unsigned char _top_byte;
> > > > > > > > > > > > > > > > >> > +                          unsigned char _top_byte_mask;
> > > > > > > > > > > > > > > > >> > +                  } _addr_top_byte;
> > > > > > > > > > > > > > > > >> > +#endif
> > > > > > > > > > > > > > > > >> >            };
> > > > > > > > > > > > > > > > >> >    } _sigfault;
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> Why the _dummy_top_byte?  Oh I see it should be spelled "short _addr_lsb;".
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> Please remove the "#ifdef __aarch64__".  If at all possible we want to
> > > > > > > > > > > > > > > > >> design this so any other architecture who has this challenge can use the
> > > > > > > > > > > > > > > > >> code.  The kind of code does not get enough attention/maintenance if it
> > > > > > > > > > > > > > > > >> is built for a single architecture.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Seems reasonable. I was recently made aware that RISC-V was
> > > > > > > > > > > > > > > considering a similar feature:
> > > > > > > > > > > > > > > https://lists.riscv.org/g/tech-tee/topic/risc_v_tbi_proposal/72855478
> > > > > > > > > > > > > > > I would have opted to expand this to other architectures on an
> > > > > > > > > > > > > > > as-needed basis, but I'd also be fine with having it on all
> > > > > > > > > > > > > > > architectures from the start.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If we make this arch-independent, we have an additional concern, which
> > > > > > > > > > > > > > > is "what if some future architecture wants more than one byte here?"
> > > > > > > > > > > > > > > For example, an architecture may have a "top-two-bytes-ignore"
> > > > > > > > > > > > > > > feature, which would imply two-byte (misnamed) "si_addr_top_byte" and
> > > > > > > > > > > > > > > "si_addr_top_byte_mask" fields. And the RISC-V proposal potentially
> > > > > > > > > > > > > > > implies many more ignored bits (see slide 13 of the presentation). The
> > > > > > > > > > > > > > > maximum size that these fields can possibly be is the size of a
> > > > > > > > > > > > > > > pointer, and with that there wouldn't be enough room in the padding at
> > > > > > > > > > > > > > > this point to accommodate the new fields.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > That basically implies your earlier suggestion of adding a union
> > > > > > > > > > > > > > > member here to accommodate future expansion of the union, and adding
> > > > > > > > > > > > > > > the new fields after the union. I'm happy to make that change, with
> > > > > > > > > > > > > > > the fields renamed "si_addr_ignored" and "si_addr_ignored_mask".
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think what we need here is basically a flags word.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So long as we keep a flag spare to indicate the existence of a further
> > > > > > > > > > > > > > flags word, we can extend as needed.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > How the existence of the first flags words is detected is another
> > > > > > > > > > > > > > problem.  If it only applies for newly-defined si_code values, then
> > > > > > > > > > > > > > I guess si_code may be sufficient.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Existing kernels will zero-initialize unused regions of the siginfo
> > > > > > > > > > > > > data structure. The zero-initialization of the padding at the end of
> > > > > > > > > > > > > the struct is done by the clear_user call here:
> > > > > > > > > > > > > https://github.com/torvalds/linux/blob/3e08a95294a4fb3702bb3d35ed08028433c37fe6/kernel/signal.c#L3193
> > > > > > > > > > > > >
> > > > > > > > > > > > > and the zero-initialization of the padding between fields and unused
> > > > > > > > > > > > > union members is done by the clear_siginfo function which the kernel
> > > > > > > > > > > > > calls when initializing the data structure:
> > > > > > > > > > > > > https://github.com/torvalds/linux/blob/3e08a95294a4fb3702bb3d35ed08028433c37fe6/include/linux/signal.h#L20
> > > > > > > > > > > > >
> > > > > > > > > > > > > Therefore, a flag word value of 0 may be used to detect a lack of
> > > > > > > > > > > > > support for flagged fields.
> > > > > > > > > > > >
> > > > > > > > > > > > It's not enough that we do this today.  We would have had to do it back
> > > > > > > > > > > > to the dawn of time (though in the arm64 case I guess we just need to go
> > > > > > > > > > > > back to when the arch/arm64 was merged).
> > > > > > > > > > > >
> > > > > > > > > > > > v2.6.12:kernel/signal.c:copy_siginfo_to_user() suggests that this wasn't
> > > > > > > > > > > > always the case, so unused parts of siginfo could be full of old junk
> > > > > > > > > > > > from the user stack, if the kernel is sufficiently old.
> > > > > > > > > > > >
> > > > > > > > > > > > If we're trying to do something generic that makes sense on all arches,
> > > > > > > > > > > > this matters.  I may have misunderstood something about the code though.
> > > > > > > > > > >
> > > > > > > > > > > Hmm, I think you're right. The current behavior was introduced by
> > > > > > > > > > > commit c999b933faa5e281e3af2e110eccaf91698b0a81 which was first
> > > > > > > > > > > released in 4.18. So if an application wants to be compatible with
> > > > > > > > > > > pre-4.18 kernels then there would need to be some other way to
> > > > > > > > > > > indicate that the fields are valid. Probably the simplest way would be
> > > > > > > > > > > to have the application issue a uname(2) syscall and check the kernel
> > > > > > > > > > > version before reading these fields. I have a couple of other ideas
> > > > > > > > > > > that don't rely on version detection, if we'd prefer to avoid that.
> > > > > > > > > > > (They are somewhat ugly, but our hand is forced by backwards
> > > > > > > > > > > compatibility.)
> > > > > > > > > > >
> > > > > > > > > > > One idea is to re-purpose the si_errno field as a flags field for
> > > > > > > > > > > certain signal numbers. I checked a few kernel releases going back to
> > > > > > > > > > > 2.6.18 and it looks like the field is set to 0 except in the following
> > > > > > > > > > > circumstances:
> > > > > > > > > > > - sending a hardware breakpoint (SIGTRAP/TRAP_HWBKPT)
> > > > > > > > > > > - seccomp failures (SIGSYS/SYS_SECCOMP)
> > > > > > > > > > > - user-defined signal via kill_pid_usb_asyncio
> > > > > > > > > > > - SIGSWI in 3.18 and before (code since removed)
> > > > > > > > > > >
> > > > > > > > > > > It is also set to EFAULT for certain SIGSEGV/SEGV_MAPERR signals on
> > > > > > > > > > > powerpc since commit c96c4436aba4c12f1f48369f2f90bc43e12fe36c, which
> > > > > > > > > > > is currently unreleased. So if we wanted to go this route for SIGSEGV
> > > > > > > > > > > we would need to stop the kernel from setting si_errno to EFAULT for
> > > > > > > > > > > this signal before the 5.8 release.
> > > > > > > > > > >
> > > > > > > > > > > Another idea was to have userspace set a flag in sa_flags when
> > > > > > > > > > > registering a signal handler meaning "this signal handler requires
> > > > > > > > > > > unknown siginfo fields to be zeroed", and have existing kernels reject
> > > > > > > > > > > the syscall due to an unknown flag being set, but unfortunately this
> > > > > > > > > > > won't work because existing kernels do not reject sigaction syscalls
> > > > > > > > > > > with unknown flags set in sa_flags. A perhaps more radical idea in
> > > > > > > > > > > this vein would be to claim some of the upper bits of the signal
> > > > > > > > > > > number as flags that will cause the syscall to be rejected if set and
> > > > > > > > > > > unknown to the kernel. Existing kernels (going back to at least
> > > > > > > > > > > 2.6.18) contain this code in do_sigaction:
> > > > > > > > > > >
> > > > > > > > > > >         if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig)))
> > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > >
> > > > > > > > > > > and vald_signal is defined as:
> > > > > > > > > > >
> > > > > > > > > > > static inline int valid_signal(unsigned long sig)
> > > > > > > > > > > {
> > > > > > > > > > >         return sig <= _NSIG ? 1 : 0;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > All architectures define _NSIG as a value <= 128, so they will reject
> > > > > > > > > > > a signal number with any of bits 8-31 set. This means that we can use
> > > > > > > > > > > any of those bits for mandatory flags. Most likely we could use bit 30
> > > > > > > > > > > (expanding down as necessary), as it keeps the signal number positive
> > > > > > > > > > > and permits future expansion of the signal number range.
> > > > > > > > > >
> > > > > > > > > > Does the signal core code actually gurantee to zero the unused fields?
> > > > > > > > > > Unless the fields are poked in by hand this is fraught with subtlelies,
> > > > > > > > > > especially when unions are involved.  (I'm sure the code tries to do it,
> > > > > > > > > > but I've not eyeballed it in detail...)
> > > > > > > > >
> > > > > > > > > It memsets the siginfo structure before setting the fields and sending
> > > > > > > > > the signal (grep for clear_siginfo which is just a memset; you should
> > > > > > > > > find a call before all callers of force_sig_info). Memset is the right
> > > > > > > > > approach here since unlike setting fields by hand it clears padding
> > > > > > > > > which could lead to information leaks from the kernel. IIUC this is
> > > > > > > > > the reason why Eric wants all of the signals to be raised via wrappers
> > > > > > > > > in kernel/signal.c instead of via force_sig_info directly (to make
> > > > > > > > > this aspect easier to audit).
> > > > > > > >
> > > > > > > > My impression was that the reason for this model is partly to ensure
> > > > > > > > that siginfo fields are populated more consistently.  When it was all
> > > > > > > > down to the individual callers, inconsistencies creeped in.
> > > > > > > >
> > > > > > > > With regard to memset(), this is not a complete defence against data
> > > > > > > > leakage.  Assigning to a struct member can set any or all padding in
> > > > > > > > the struct to random garbage (consider write-combining of neighboring
> > > > > > > > member writes into a single larger accesses in asm for example).  The
> > > > > > >
> > > > > > > I don't believe that LLVM will store to padding like this. I don't
> > > > > > > know about GCC, though, but I wouldn't be surprised if this is
> > > > > > > something that the kernel would want to turn off in "kernel C" (like
> > > > > > > it turns off strict aliasing) specifically because of the information
> > > > > > > leak issue.
> > > > > >
> > > > > > Again, the issue is not future kernel builds -- we can always find a way
> > > > > > to fix the behaviour for those -- but past kernel builds.
> > > >
> > > > I thought that the whole point of the "bit in the signal number" (or
> > > > SI_CODEX or whatever) was that we didn't need to worry about the
> > > > behavior of past kernel builds?
> > >
> > > It depends on what we use the new flag(s) for.
> > >
> > > If the flag means just that unused padding is safely zeroed, that could
> > > work -- but we'd want high confidence that it really is zeroed even in
> > > wacky configurations.
> > >
> > > > > > > > only way to avoid this is to ensure that the struct is 100%
> > > > > > > > padding-free, and that each member of a union is the same size.  A
> > > > > > > > quick clance at <uapi/asm-generic/siginfo.h> confirms that this is not
> > > > > > > > the case.
> > > > > > > >
> > > > > > > > This might need to be looked at separately.
> > > > > > > >
> > > > > > > > But it does mean, strictly speaking, that we can't reliably add new
> > > > > > > > fields anywhere that there was previously padding: assigning to
> > > > > > > > neighboring members can still fill those with garbage after the
> > > > > > > > memset().
> > > > > > >
> > > > > > > ...but this is largely moot because I'm not proposing to add new
> > > > > > > fields in the padding any more (because the fields needed to become
> > > > > > > larger in order to accommodate future hypothetical architectures which
> > > > > > > might want to use the fields, and thus they wouldn't fit in the
> > > > > > > padding). The siginfo.h diff would be something like:
> > > > > > >
> > > > > > > diff --git a/include/uapi/asm-generic/siginfo.h
> > > > > > > b/include/uapi/asm-generic/siginfo.h
> > > > > > > index cb3d6c267181..4a2fe257415d 100644
> > > > > > > --- a/include/uapi/asm-generic/siginfo.h
> > > > > > > +++ b/include/uapi/asm-generic/siginfo.h
> > > > > > > @@ -91,7 +91,10 @@ union __sifields {
> > > > > > >                                 char _dummy_pkey[__ADDR_BND_PKEY_PAD];
> > > > > > >                                 __u32 _pkey;
> > > > > > >                         } _addr_pkey;
> > > > > > > +                       void *_pad[6];
> > > > > > >                 };
> > > > > > > +               uintptr_t _ignored_bits;
> > > > > > > +               uintptr_t _ignored_bits_mask;
> > > > > >
> > > > > > This _is_ in padding: the tail-padding of the (previously smaller)
> > > > > > _sigfault.  Again, the compiler was allowed to populate this area with
> > > > > > junk before these fields were added.
> > > > > >
> > > > > > I agree that it seems fairly unlikely that the compiler would have been
> > > > > > overwriting this in normal circumstances, but that's not a guarantee.
> > > > > > My worry is that if this goes wrong, it will go wrong silently and
> > > > > > unpredictably.
> > > > > >
> > > > > > >         } _sigfault;
> > > > > > >
> > > > > > >         /* SIGPOLL */
> > > > > > >
> > > > > > > or with a "uintptr_t _flags" added in before _ignored_bits if we go with that.
> > > > > > >
> > > > > > > > > > Using unused bits in the signal number to turn on new functionality
> > > > > > > > > > feels risky.  As currently specified, this is just a number.  Since
> > > > > > > > > > today a successful sigaction(n ...) guarantees that n is a valid signal
> > > > > > > > > > number, reasonable code like the following would trigger a buffer
> > > > > > > > > > overrun if we start trying to encode anything else in there:
> > > > > > > > > >
> > > > > > > > > > struct sigaction actions[NSIG];
> > > > > > > > > >
> > > > > > > > > > int do_something( ... )
> > > > > > > > > > {
> > > > > > > > > >         ...
> > > > > > > > > >
> > > > > > > > > >         if (!sigaction(n, sa, ...)) {
> > > > > > > > > >                 actions[n] = *sa;
> > > > > > > > > >                 return 0;
> > > > > > > > > >         }
> > > > > > > > > >
> > > > > > > > > >         ...
> > > > > > > > > > }
> > > > > > > > >
> > > > > > > > > I imagine the bit in the signal number being set by the direct caller
> > > > > > > > > to sigaction, and we could specifically recommend that calling
> > > > > > > > > pattern. In that case, your "n" wouldn't have the bit set in it. It
> > > > > > > >
> > > > > > > > I can imagine this too, but that doesn't mean that software does it.
> > > > > > > >
> > > > > > > > If the above kind of thing exists in a framework or library somewhere,
> > > > > > > > we could get problems.  Similarly, a pre-existing LD_PRELOAD framework
> > > > > > > > that provides a wrapper for sigaction may now go wrong even if your
> > > > > > > > pattern is followed -- i.e., the caller thinks it's calling sigaction
> > > > > > > > directly but in fact it isn't.
> > > > > > >
> > > > > > > I'm aware of one library like that. It's called libsigchain, and it
> > > > > > > has an early bounds check:
> > > > > > > https://cs.android.com/android/platform/superproject/+/master:art/sigchainlib/sigchain.cc;l=371
> > > > > > >
> > > > > > > Until the library is changed to recognize the flag, calling code would
> > > > > > > see the return value of -1 as if the kernel failed the syscall, and
> > > > > > > would fall back to the code for old kernels.
> > > > > >
> > > > > > But only after some bad dereferences.  If these were writes, this means
> > > > > > that memory _may_ be silently corrupted (I don't say it't likely in a
> > > > > > given case, and we cannot pick a flag bit that makes this impossible).
> > > >
> > > > You're talking about libsigchain, right? I don't see any bad
> > > > references, the function returns after noticing the bounds check
> > > > failure.
> > >
> > > Yes, I confused myself by reading Handler() out of context.  The kernel
> > > will invoke this with signo to a real signal number (without any flags).
> > >
> > > The sigaction wrapper does the bounds check before doing anything else,
> > > just as you say -- so that looks fine.
> > >
> > > (Side question: is all this thread-safe?  Is there some implicit locking
> > > somewhere?)
> >
> > I think maybe it isn't? There seem to be possible races on the
> > handler_ field. One possibility is that the function could race with
> > itself on another thread, which could be fixed via locking, but it
> > would also need to handle races between itself and the signal handler,
> > most likely by blocking the signal while setting it.
>
> Hmmm, tricky... anyway, that's not my problem ;)
>
> > > > > > So, _even though the user program is correct_, our change may trigger
> > > >
> > > > Let's say that you were talking about some other library and not
> > > > libsigchain. Such an interceptor wouldn't be correct though, it failed
> > > > to account for our change to the syscall semantics. If the accesses
> > > > were before the syscall (or the bounds check), then the interceptor
> > > > would not have been correct in the first place because POSIX requires
> > > > returning -1 with errno=EINVAL (and not crashing) if the signal number
> > > > is invalid.
> > > >
> > > > > > the corruption of arbitrary user memory.  This what I mean by an ABI
> > > > > > break.  The fact that the corruption is not done by the syscall itself
> > > > > > is no excuse.
> > > >
> > > > At some point, though, accommodating interceptors becomes pretty much
> > > > tantamount to saying "we can never change anything". Even just adding
> > > > a field to __sifields (which is pretty much required for what we need
> > > > to do) could break things in the presence of some interceptors because
> > > > the interceptor could be copying the fields manually to a new data
> > > > structure before calling the user's signal handler (e.g. because it
> > > > wants to defer the signal until later) and miss our new field. I think
> > > > most of the other ideas we're discussing fail to meet this bar as well
> > > > and I'll go into more details later on.
> > >
> > > I agree we cannot always avoid breaking such things.  But we should do
> > > our best to avoid it.
> >
> > I think that given the hand that we've been dealt, no matter what we
> > do, we can't really avoid risking breaking something. The relevant
> > questions are "what are we going to risk breaking", "how much risk is
> > there", "will it be easily noticed/fixable", and "once we're on the
> > other side of the potential breakage, will we find ourselves in a
> > position where changing things involves less breakage risk".
> >
> > > > > > We also fail to notice failures in sigaddset() etc., though in this code
> > > > > > it looks like that should not matter.
> > > >
> > > > Maybe you're looking at the handler ("SignalChain::Handler")? The bit
> > > > wouldn't be set in the signo argument to the handler. I'm talking
> > > > about line 371 of the code I linked, in the sigaction interceptor
> > > > "__sigaction" (it looks like sometimes the link doesn't take you to
> > > > the correct line for some reason).
> > >
> > > Ack, I confused myself.
> > >
> > > > > > > In general I think that any library like this with independent
> > > > > > > tracking of the kernel's purported signal handler state would need to
> > > > > > > be very sensitive to which syscalls are capable of setting signal
> > > > > > > handlers, what their semantics are, and so on. This applies to any
> > > > > > > change that we might make to the signal handler interface. So for
> > > > > > > example, if we introduced a new syscall as you propose below, and the
> > > > > > > library hasn't been updated to recognize the new syscall, it will
> > > > > > > silently miss changes in signal handler state caused by the new
> > > > > > > syscall.
> > > > > > >
> > > > > > > At the end of this argument lies "we can never change anything about
> > > > > > > how signal handlers work because it could break some interposing
> > > > > > > library somewhere" -- replace "signal handlers" with any kernel
> > > > > > > feature whose behavior may be modified by an interposing library if
> > > > > > > you like -- and I don't think we want to go that far. As far as I
> > > > > > > know, this isn't really the kernel's business anyway -- the kernel's
> > > > > > > stable ABI contract starts and ends with the syscall interface and not
> > > > > > > some library on top.
> > > > > > >
> > > > > > > That being said, we should perhaps try to define our interface so that
> > > > > > > something reasonable will probably happen if there is such a library
> > > > > > > and it hasn't been updated. With the new syscall, the library will
> > > > > > > sometimes silently fail to work in some non-local fashion. With the
> > > > > > > flag bit in the signal number, the library will either cause the
> > > > > > > caller to fall back to the old kernel code path (if there is a bounds
> > > > > > > check) or likely crash loudly (if there is no bounds check). To me,
> > > > > > > the "flag bit in the signal number" behavior seems more reasonable,
> > > > > > > since either something correct or something easy to debug will
> > > > > > > probably happen at runtime.
> > > > > > >
> > > > > > > > > could only appear in newly-written code that doesn't follow our
> > > > > > > > > recommendations, and there are already plenty of much more likely ways
> > > > > > > > > to cause buffer overflows in C code that doesn't follow
> > > > > > > > > recommendations anyway. (And even if such a buffer overflow occurred,
> > > > > > > > > it would very likely be caught early in development by the MMU due to
> > > > > > > > > the magnitude of the number 1<<30.)
> > > > > > > >
> > > > > > > > Choosing the bit value is hard.  If shitfing it overflows, this can
> > > > > > > > trigger random undefined behaviour in the compiler in addition to (or
> > > > > > > > perhaps instead of) an out-of-bounds access or segfault.
> > > > > > >
> > > > > > > It wouldn't overflow on a 64-bit architecture assuming normal array
> > > > > > > indexing (the index would be promoted to pointer width before being
> > > > > > > scaled to the array element size), and to begin with the users of this
> > > > > > > would be 64-bit.
> > > > > >
> > > > > > Unless we don't offer this feature for 32-bit at all (possible, if ugly)
> > > > > > we can't stop people using it.
> > > >
> > > > My point is that the problem in the interceptor library would probably
> > > > be noticed on 64-bit (since that's what most people use these days),
> > > > which would probably result in it being fixed by the time it reaches
> > > > 32-bit users.
> > >
> > > Agreed.  But we shouldn't take such bets unless we really have to.
> > >
> > > > > > > > If shifting it doesn't overflow, we might still fall into a valid
> > > > > > > > mapping, though I'd agree a segfault is more likely.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > I think it would be cleaner for to add a single flag field that can be
> > > > > > > > > > used for detecting other extensions, and request it via a new sa_flags
> > > > > > > > > > bit.  This removes the need for sematically useless zeroing of unused
> > > > > > > > > > fields (though for hygiene and backwards compatibility reasons we would
> > > > > > > > > > probably want to carry on zeroing them anyway).
> > > > > > > > > >
> > > > > > > > > > I can see no simpler way to add supplementary siginfo fields for
> > > > > > > > > > existing si_codes.  For si_codes that didn't exist before the zeroing
> > > > > > > > > > came in we could still detect optional si_code-specific fields via
> > > > > > > > > > zeroing, but it seems messary to have two ways of detecting extensions.
> > > > > > > > >
> > > > > > > > > That would certainly be cleaner if it worked, but that would only be
> > > > > > > > > the case if old kernels rejected unknown bits in sa_flags, and
> > > > > > > > > unfortunately they don't. With the bit in the signal number, the "old
> > > > > > > >
> > > > > > > > Hmm, that is a problem I wasn't aware of.
> > > > > > > >
> > > > > > > > > kernels reject" behavior admits relatively straightforward usage code:
> > > > > > > > >
> > > > > > > > > void set_segv_handler(void) {
> > > > > > > > >   struct sigaction sa;
> > > > > > > > >   sa.sa_sigaction = handle_segv;
> > > > > > > > >   sa.sa_flags = SA_SIGINFO;
> > > > > > > > >   if (sigaction(SIGSEGV | SF_CLEAR_UNKNOWN_FIELDS, &sa, 0) < 0) { //
> > > > > > > > > succeeds in new kernels, fails in old kernels
> > > > > > > > >     sa.sa_sigaction = clear_fields_and_handle_segv;
> > > > > > > > >     if (sigaction(SIGSEGV, &sa, 0) < 0) // succeeds in old kernels
> > > > > > > > >       perror("sigaction");
> > > > > > > > >   }
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > void clear_fields_and_handle_segv(int signum, siginfo_t *sa, void *ctx) {
> > > > > > > > >   sa->si_future_field = 0;
> > > > > > > > >   handle_segv(signum, sa, ctx);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > void handle_segv(int signum, siginfo_t *sa, void *ctx) {
> > > > > > > > >   // At this point, si_future_field will have the value 0 in old
> > > > > > > > > kernels and the kernel-supplied value in new kernels.
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > Imagine if we moved the flag SF_CLEAR_UNKNOWN_FIELDS from the signal
> > > > > > > > > number to sa_flags. In that case, the first sigaction would succeed in
> > > > > > > > > old kernels so handle_segv wouldn't know whether it can safely read
> > > > > > > > > from si_future_field. With the sa_flags approach, you would need
> > > > > > > > > kernel version number checking via uname before setting the flag in
> > > > > > > > > sa_flags, and at that point why even have the flag in sa_flags at all
> > > > > > > > > since you could just have the signal handler conditionally read from
> > > > > > > > > si_future_field based on the uname?
> > > > > > > >
> > > > > > > > Software setting SA_SIFLAGS (or whatever) is new by definition, since
> > > > > > > > it would be using a new #define.  So it might be reasonable to put the
> > > > > > > > burden on that software to verify that the flag was really accepted by
> > > > > > > > the kernel, by reading it back.
> > > > > > >
> > > > > > > That doesn't seem like a good idea even if it worked, because it could
> > > > > > > lead to race conditions. If the si_flags-reading signal handler were
> > > > > > > invoked in response to a signal between when you set it and when you
> > > > > > > ended up replacing it with the fallback signal handler for old
> > > > > > > kernels, the handler may end up reading garbage data from si_flags.
> > > > > >
> > > > > > Not really.  My example may have this problem, but the signal handler
> > > > > > can be written to support both scenarios, based on testing a flag that
> > > > > > the main program sets after verifying that the flag could be set.  Or
> > > > > > the signal could be blocked around establishment (often a good idea for
> > > > > > other reasons).
> > > > > >
> > > > > > But I agree it's a bit gross, and anyway doesn't work due to the fact
> > > > > > that the kernel doesn't filter out unrecognised flags anyway.
> > > > > >
> > > > > > > > Unfortunately, even relatively recent kernels blindly store sa_flags
> > > > > > > > in the kernel without validating it, and so it looks like duff flags
> > > > > > > > can be read back out via a sigaction() call.  Dang.
> > > > > > > >
> > > > > > > >
> > > > > > > > Perhaps a new frontend syscall could be added.  A new libc that knows
> > > > > > > > about this "sigaction2" could use it and mask off problem bits from
> > > > > > > > sa_flags in its sigaction() wrapper before calling sigaction2.  An old
> > > > > > > > libc would call the old sigaction syscall, where we would ignore these
> > > > > > > > new sa_flags bits as before.
> > > > > > >
> > > > > > > I'm not currently in favor of the new syscall but if we do this I
> > > > > > > would keep sigaction and sigaction2 separate. That is, libc sigaction
> > > > > > > should always use the sigaction syscall, and libc sigaction2 should
> > > > > > > always use the sigaction2 syscall. We should avoid libc's sigaction
> > > > > > > having different behavior based on the libc version and kernel
> > > > > > > version, as that would make it harder to reason about its behavior.
> > > > > > > Calling code would need to check for presence of sigaction2 in both
> > > > > > > libc and the kernel, e.g.
> > > > > > >
> > > > > > > __attribute__((weak)) decltype(sigaction2) sigaction2;
> > > > > > >
> > > > > > > void set_segv_handler(void) {
> > > > > > >   struct sigaction sa;
> > > > > > >   sa.sa_sigaction = handle_segv;
> > > > > > >   sa.sa_flags = SA_SIGINFO | SA_SIFLAGS;
> > > > > > >   if (!sigaction2 || sigaction2(SIGSEGV, &sa, 0) < 0) {
> > > > > > >     sa.sa_sigaction = clear_fields_and_handle_segv;
> > > > > > >     sa.sa_flags = SA_SIGINFO;
> > > > > > >     if (sigaction(SIGSEGV, &sa, 0) < 0)
> > > > > > >       perror("sigaction");
> > > > > > >   }
> > > > > > > }
> > > > > >
> > > > > > I guess.  But I share your distaste for adding a new syscall.
> > > > > >
> > > > > > >
> > > > > > > > This may not be a popular approach though, and software wouldn't be able
> > > > > > > > to use our new features until libc is updated to match.
> > > > > > > >
> > > > > > > > If we go down this route, it may provide additional opportunities to fix
> > > > > > > > annoying defects in the old interface.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Note that the same applies to a flag indicating the availability of a
> > > > > > > > > si_flags field in sigaction (just
> > > > > > > > > s/SF_CLEAR_UNKNOWN_FIELDS/SF_HAS_SI_FLAGS/ and
> > > > > > > > > s/si_future_field/si_flags/ in the usage code above). In terms of
> > > > > > > > > SF_CLEAR_UNKNOWN_FIELDS versus SF_HAS_SI_FLAGS I'd be fine either way.
> > > > > > > > >
> > > > > > > > > Another thought that occurred to me is that we may consider
> > > > > > > > > generalizing this a step further and introducing a single flag bit in
> > > > > > > > > the signal number that means "reject unknown flags in sa_flags". This
> > > > > > > > > would mean that we wouldn't need to add any more flag bits to the
> > > > > > > > > signal number in the future, thus limiting this signal number hack to
> > > > > > > > > a single bit; all future mandatory behavior changes could just be put
> > > > > > > > > behind a flag in sa_flags and userspace code would easily be able to
> > > > > > > > > detect missing support for a flag and fall back if necessary. In our
> > > > > > > > > case, this would imply usage code like this:
> > > > > > > > >
> > > > > > > > > void set_segv_handler(void) {
> > > > > > > > >   struct sigaction sa;
> > > > > > > > >   sa.sa_sigaction = handle_segv;
> > > > > > > > >   sa.sa_flags = SA_SIGINFO | SA_CLEAR_UNKNOWN_FIELDS;
> > > > > > > > >   // Succeeds in kernels with SA_CLEAR_UNKNOWN_FIELDS support.
> > > > > > > > >   // Fails in kernels with SF_CHECK_SA_FLAGS support but no
> > > > > > > > > SA_CLEAR_UNKNOWN_FIELDS support (because of the unknown flags check).
> > > > > > > > >   // Fails in kernels without SF_CHECK_SA_FLAGS support (because of
> > > > > > > > > the bounds check on the signal number).
> > > > > > > > >   if (sigaction(SIGSEGV | SF_CHECK_SA_FLAGS, &sa, 0) < 0) {
> > > > > > > > >     sa.sa_sigaction = clear_fields_and_handle_segv;
> > > > > > > > >     sa.sa_flags = SA_SIGINFO;
> > > > > > > > >     // Succeeds in old kernels, no need to use SF_CHECK_SA_FLAGS since
> > > > > > > > > we're using sa_flags from the beginning of time.
> > > > > > > > >     if (sigaction(SIGSEGV, &sa, 0) < 0)
> > > > > > > > >       perror("sigaction");
> > > > > > > > >   }
> > > > > > > > > }
> > > > > > > >
> > > > > > > > As with the other options this could work, but looks like it could
> > > > > > > > break the ABI due to violating the original semantics for the signal
> > > > > > > > number argument.  Perhaps I'm being too paranoid.
> > > > > > >
> > > > > > > There's no ABI being broken here, as long as we consider syscalls to
> > > > > > > be the stable ABI layer. Old kernels are simply rejecting arguments
> > > > > > > that they don't know about yet. By that argument, any introduction of
> > > > > > > a new syscall is an ABI break because it changes the semantics of a
> > > > > > > previously-unallocated syscall number.
> > > > > >
> > > > > > As argued above, I think this is an invalid argument.
> > > > > >
> > > > > > Although any addition will change behaviour (so is a break in some
> > > > > > sense), the key is not to make "surprising" changes.
> > > >
> > > > If we care about interceptors then I don't think "surprising" comes
> > > > into it. It's more a question of "does the anticipated behavior of the
> > > > interceptor match our desired behavior", where "desired" means "most
> > > > likely to avoid silent breakage". We would need to get into the head
> > > > of a potential interceptor author and think about how they would have
> > > > handled the signal number argument, as well as other arguments like
> > > > sa_flags if we want to go that route, and see whether that behavior
> > > > would lead to the desired result.
> > >
> > > That's exactly what I mean by "surprising".
> >
> > Not quite, see below.
> >
> > > However, not every
> > > interceptor author will be making the same assumptions, and not every
> > > bit of software affected will be an interceptor.
> >
> > I can see a couple of ways in which non-interceptor software could be affected:
> >
> > - It's doing something like "call sigaction on every possible signal
> > number in the 31-bit range and end up failing if the syscall
> > succeeded" (e.g. with an OOB write). Perhaps software could be doing
> > something like this in a loop to collect all currently registered
> > signal handlers. That being said, this program:
> >
> > #include <limits.h>
> > #include <signal.h>
> >
> > int main() {
> >   struct sigaction act;
> >   for (int i = 1; i != INT_MAX; ++i) {
> >     sigaction(i, 0, &act);
> >   }
> > }
> >
> > takes around 5 seconds to run on my relatively-fast machine, so I
> > would expect any such code to be noticed as a performance issue and
> > either be changed to be bounded on _NSIG or break on EINVAL.
> >
> > This is probably the largest potential flaw that I can currently see
> > in the "bit in the signal number" idea, since it could conceivably
> > result in userspace code being broken without having first required it
> > to have been changed to make use of the new feature. I'm not convinced
> > that it would be an ABI break though, because the code seems unlikely
> > to exist in this form in the wild because of the performance issue,
> > and you could anyway make the argument that the code is incorrect
> > because, in order to contain a loop like this, it would need to be
> > able to handle large, previously-unknown signal numbers somehow. If we
> > accept that the code is incorrect, a similar line of argument applies
> > as for interceptors (i.e. likely to result in an OOB access which will
> > fail loudly and be easily debugged and fixed).
> >
> > - If we do something that involves introducing a new flag in sa_flags,
> > the flag may be exposed to unaware software via the oldact argument to
> > sigaction, and I suppose that it's conceivable that exposing a
> > previously-unknown flag like this could somehow break something. But
> > this seems like an unreasonable restriction because it would mean that
> > we can never add a flag to sa_flags no matter what.
> >
> > >  So some judgement
> > > needs to be applied.
> >
> > Of course. We need to agree *how* to apply the judgement though.
> >
> > > > In this case, I think we exactly want the interceptor author to have
> > > > thought "oh, it's just a number, I'll (possibly do a bounds check and
> > > > then) use the number as an index into an array". This will lead to one
> > > > of two outcomes: crashing (yes, yes, it won't always crash, but if the
> > > > alternative is that it never crashes and we get silently incorrect
> > > > behavior all of the time, I'll take sometimes crashing) or fail the
> > > > bounds check and pretend to be an old kernel (the latter is
> > > > anticipated by POSIX which requires returning -1/EINVAL for an invalid
> > > > signal number). Each of these behaviors are desirable, as they are
> > > > observable failures, which are more likely to result in fixes than
> > > > silent ones.
> > >
> > > Agreed, except wanting the author to have thought something doesn't
> > > ensure that they actually did think that.
> >
> > True, but if our goal is only to accommodate reasonably written
> > interceptors, we don't actually need to ensure anything here.
> >
> > > > > > Having something random happen when setting a previously reserved flag
> > > > > > bit, or when issuing a syscall when an unknown syscall number, or not
> > > > > > surprising at all.
> > > >
> > > > Introducing a new syscall is right out in this model. The interceptor
> > > > author wouldn't have anticipated our introducing a new syscall, so the
> > > > new syscall wouldn't be intercepted and calls to the new syscall would
> > > > silently bypass the interceptor. For example, adding sigaction2 could
> > > > result in signal handlers being set without the interceptor's
> > > > knowledge.
> > >
> > > Agreed.  My sentence was a bit mangled: I mean to say "Having something
> > > random happen when [...] issuing a syscall *with* an unknown syscall
> > > number *is* not surprising at all."
> > >
> > > I agree that adding a new syscall is problematic if we want to avoid
> > > breaking existing interceptors in particular.  Other types of code are
> > > much less likely to be affected by the addition of new syscalls.
> >
> > Right, and this to me is a case in point for why I would say that
> > "surprising" isn't the right frame of analysis here. My analysis seems
> > to generally be that "anticipated interceptor behavior matches desired
> > behavior" is positively correlated with "surprising" (i.e. the
> > interceptor viewpoint is the dual of the user viewpoint), so if we
> > care about interceptors we may end up making a "surprising" change
> > even though it doesn't intuitively seem like the right thing to do.
>
> You're right that interceptors are different from normal callers.  I'm
> not sure I follow your argument, but an alternative way of looking at
> it might be to say that an interceptor is both an implementation of an
> interface and a caller of the same interface.  Since API specs are
> rarely complete enough to cover the corner cases that arise from this,
> full portability is hard to achieve on top of an evolving kernel.
>
> However, I think this interceptor thing is a bit of a red herring.  I
> just intended that as an illustration of the kind of code that might
> fall foul.  This doesn't mean that it's 100% certain that no other
> software can be affected.
>
> The starting points for this discussion were: "is it reasonable for a
> caller to pass an unvalidated signal number to sigaction(), and rely on
> sigaction() to validate it?" and "is it reasonable to assume that a
> signal number accepted by sigaction() fits the POSIX specification of
> a valid signal number?"
>
> I think yes; you aren't (or weren't) convinced.
>
> The mere fact that it's hard to agree suggests to me that the
> specification is too weak to extend safely in this area.  Unfortunately,
> it's rather weak for sa_flags too, although a non-full flags argument
> does at least suggest that future extensions might appear.
>
> > > > Regarding a sa_flags bit, let's get inside the head of the interceptor
> > > > author again. How would they handle a flag bit that they don't
> > > > recognize when replacing the signal handler? It wouldn't be correct to
> > > > just pass it through to the kernel, or drop the flag on the floor, as
> > > > it might be semantically meaningful (and thus could change the calling
> > > > convention as SA_SIGINFO does, or change the meaning of fields in
> > > > siginfo, as SA_CODEX would do). A correctly written sigaction
> > > > interceptor should probably abort the program upon encountering an
> > > > unknown flag (thus giving a human a chance to update the interceptor),
> > > > but chances are that they don't. Ignoring all but a few flags (and
> > > > passing a fixed set of flags to the kernel) seems to be what
> > > > libsigchain does, and in the case of SA_CODEX it would seem to result
> > > > in desirable behavior (but I suspect that it isn't handling the other
> > > > flags correctly), but I could also see an interceptor author just
> > > > passing it unchanged to the kernel without checking it (perhaps
> > > > because they didn't think about these issues, and because that didn't
> > > > matter until now, with the exception of from-the-beginning-of-time
> > > > flags like SA_SIGINFO). And with SA_CODEX that could lead to silent
> > > > misreading of si_code in the interceptor's signal handler, if it
> > > > hasn't been updated to use the new macros.
> > >
> > > Agreed.  I've tried to implement things rather like this in the past,
> > > and how to interpret the flags is a tricky issue.  Some of the flags are
> > > impossible to emulate even when you know what they mean, in particular
> > > SA_NODEFER and SA_RESTART.
> > >
> > > Making new flags safe to ignore and harmless to set of you don't know
> > > what they mean is the safest approach, but not always possible (I think
> > > I managed this with by suggestion below, though).
> >
> > Again, this suggestion could lead to silent failures in an interceptor, if:
> > - the interceptor passes the sa_flags through to the kernel unchanged
> > (or otherwise doesn't touch SA_CODEX)
> > - the interceptor replaces the user's sa_sigaction
> > - the interceptor's replacement sa_sigaction tests the provided si_code.
> >
> > Maybe you're not concerned about that, though? At least to me it seems
> > in the same ballpark of likelihood as the ways in which things could
> > go wrong with the signal number bit.
>
> I agree this is a concern, and perhaps a bit nastier in practice than
> side-effects of setting random bits in the signal number.
>
> Personally I do tend to be paranoid about flags arguments and try to
> police them in any code that isn't a trivial pass-through, but this
> doesn't mean that all code out there does it.  (Including the kernel's
> sigaction()!)
>
> > > Ideally, a flags field should be specified with rules that say exactly
> > > what to do with flags you don't recognise.  Sadly this is usually not
> > > thought about until it's too late.
> >
> > It perhaps isn't too late to introduce such rules for sigaction if we
> > adopt the signal number bit and we make it mean "reject unknown
> > flags".
>
> If Eric likes the idea then fair enough, but as I've tried to argue this
> may still just be moving the problem around rather than solving it.
>
>
> As a final random idea to add to the mix, we could add two or more
> flags in sa_flags, and require the kernel to transform them in a
> specific way, say:
>
> #define SA_WANT_FLAGS 0x00c700000
> #define SA_HAVE_FLAGS 0x009200000
> #define SA_FLAGS_MASK 0x00ff00000
>
> volatile sig_atomic_t have_flags = 0;
>
>         sa.sa_flags |= SA_WANT_FLAGS;
>         if (sigaction(n, &sa, NULL))
>                 if (!sigaction(n, NULL, &sa) &&
>                                 (sa.sa_flags & SA_FLAGS_MASK) == SA_HAVE_FLAGS)
>                         have_flags = 1;
>
> This is at least proof against "dumb readback".
>
> Provided that the handler can cope with the have_flags == 0 case and
> just reads the flag once per call, I don't think we would need to worry
> about races.
>
> Of course, an interceptor that doesn't understand this mechanism and
> munges or manufactures its own siginfo might still fail to properly
> initialise our new field before passing it on to a signal handler that
> is expecting it.  But that's already broken: such an interceptor might
> also not understand new si_codes that the client code absolutely relies
> on.  And new si_codes _do_ get added (that's another extensibility fail
> in the existing signal API).
>
>
> So... overall, maybe a bit in the signal number isn't a lot worse, and
> perhaps it _will_ lead to cleaner failures.
>
> Really, I don't see a way to solve it properly without a new API.

I started on implementing my signal number bit idea, and in the
process of doing so came up with another idea that may be better from
the "don't abuse existing arguments" perspective. It involves a
sigaction protocol similar to the one that you describe above, but it
only requires one new bit (plus one bit per new flag) so it is less
wasteful of sa_flags bits.

The idea is twofold:

1. Require the kernel to clear unknown flag bits in sa_flags when
passing them back in oldact. I suppose that this is technically a
behavior change for sigaction, but critically, this change in behavior
only applies to unallocated flags, which we are free to change the
meaning of. We can simply define each existing unallocated flag bit to
mean "clear this bit in oldact (unless the bit becomes supported in
the future)". There was already code doing something similar in a
limited fashion on x86, which we can remove by using this approach.

2. Define a flag bit SA_UNSUPPORTED which will never be supported by
the kernel. Now userspace can use the fact that the bit has been
cleared to mean that it can trust that other unsupported bits have
also been cleared.

Now we may have code like this:

#define SA_UNSUPPORTED 0x400
#define SA_XFLAGS 0x800

volatile sig_atomic_t have_xflags = 0;

         sa.sa_flags |= SA_UNSUPPORTED | SA_XFLAGS;
         if (sigaction(n, &sa, NULL))
                 if (!sigaction(n, NULL, &sa) &&
                                 !(sa.sa_flags & SA_UNSUPPORTED) &&
(sa.sa_flags & SA_XFLAGS))
                         have_xflags = 1;

> In the meantime, can I suggest:
>
>  (1) Come up with an extensible way of encoding supplementary
>      information in siginfo.  If the consensus is that zeroing unused
>      fields is sufficient and that the kernel and compiler will
>      reliably do it, then great.  Otherwise, we might need explicit
>      flags fields or something.

I thought about this for a while and concluded that we probably want a
flags field anyway. si_addr_ignored_bits is something of a special
case in the sense that we can define the zero value to mean
"unknown" by taking advantage of the mask field (which I suppose is
something of a flags field), but we can't necessarily say that the
same is true for any fields that we may add in the future. For
example, if we wanted to communicate whether the failing access is a
read or a write, we would need a tristate: read, write and "unknown"
(and arrange for old kernels' behavior to be interpreted as
"unknown"). If we rely on zeroing then we may implement this by adding
a field like:

char si_access_type; // 0 = unknown, 1 = read, 2 = write

But that's really just a (slightly wasteful, because we use the entire
byte) flags field, so we may as well define an actual flags field to
begin with and let people add their flags there.

Unfortunately we can't name it sa_flags because ia64 got there first.
We may consider making the ia64 field generic though (ia64 only uses
one bit of their field, so we would have 31 free bits). In the
meantime, I added a separate field, sa_xflags.

>  (2) Hack up any simple mechanism (such as your signal number flag) for
>      requesting/detecting the extra information.
>
> Along with an illustration of a application of the mechanism (i.e.,
> reporting address tag bits), this should at least provide a basis for
> further review.
>
> We can then try to swap in a different mechanism for (2) if people have
> still have concerns (or it not, keep it).

Sounds good. Apologies for not replying sooner, I was hoping that Eric
would chime in so that I would get a sense of which approach he would
prefer (so that I wouldn't spend as much time implementing in an
undesired direction), then this fell off my radar. I decided to go
with the SA_UNSUPPORTED approach that I mentioned above for now, and
I'll send a v9 with that implemented shortly. Most of the change is
about letting the architecture-independent code know which bits are
supported, so it should be easy to replace the detection mechanism
with another idea like the signal number bit.

Peter