[PATCH v8] arm64: Expose FAR_EL1 tag bits in siginfo

Wed Jun 24 15:51:43 EDT 2020

On Wed, Jun 24, 2020 at 10:12 AM Dave Martin <Dave.Martin at arm.com> wrote:
>
> On Wed, Jun 24, 2020 at 09:51:49AM -0700, Peter Collingbourne wrote:
> > On Wed, Jun 24, 2020 at 2:28 AM Dave Martin <Dave.Martin at arm.com> wrote:
> > >
> > > On Tue, Jun 23, 2020 at 05:40:08PM -0700, Peter Collingbourne wrote:
> > > > On Tue, Jun 23, 2020 at 10:52 AM Eric W. Biederman
> > > > <ebiederm at xmission.com> wrote:
> > > > >
> > > > > Dave Martin <Dave.Martin at arm.com> writes:
> > > > >
> > > > > > On Tue, Jun 23, 2020 at 07:54:59AM -0500, Eric W. Biederman wrote:
> > > > > >> Peter Collingbourne <pcc at google.com> writes:
> > > > > >>
> > > > > >> > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > > > > >> > index 47f651df781c..a8380a2b6361 100644
> > > > > >> > --- a/arch/arm64/kernel/traps.c
> > > > > >> > +++ b/arch/arm64/kernel/traps.c
> > > > > >> > @@ -235,20 +235,41 @@ static void arm64_show_signal(int signo, const char *str)
> > > > > >> >  }
> > > > > >> >
> > > > > >> >  void arm64_force_sig_fault(int signo, int code, void __user *addr,
> > > > > >> > +                     unsigned long far, unsigned char far_tb_mask,
> > > > > >> >                       const char *str)
> > > > > >> >  {
> > > > > >> >    arm64_show_signal(signo, str);
> > > > > >> > -  if (signo == SIGKILL)
> > > > > >> > +  if (signo == SIGKILL) {
> > > > > >> >            force_sig(SIGKILL);
> > > > > >> > -  else
> > > > > >> > -          force_sig_fault(signo, code, addr);
> > > > > >> > +  } else {
> > > > > >> > +          struct kernel_siginfo info;
> > > > > >> > +          clear_siginfo(&info);
> > > > > >> > +          info.si_signo = signo;
> > > > > >> > +          info.si_errno = 0;
> > > > > >> > +          info.si_code = code;
> > > > > >> > +          info.si_addr = addr;
> > > > > >> > +          info.si_addr_top_byte = (far >> 56) & far_tb_mask;
> > > > > >> > +          info.si_addr_top_byte_mask = far_tb_mask;
> > > > > >> > +          force_sig_info(&info);
> > > > > >> > +  }
> > > > > >> >  }
> > > > > >> >
> > > > > >> >  void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
> > > > > >> > -                      const char *str)
> > > > > >> > +                      unsigned long far, const char *str)
> > > > > >> >  {
> > > > > >> > +  struct kernel_siginfo info;
> > > > > >> > +
> > > > > >> >    arm64_show_signal(SIGBUS, str);
> > > > > >> > -  force_sig_mceerr(code, addr, lsb);
> > > > > >> > +
> > > > > >> > +  clear_siginfo(&info);
> > > > > >> > +  info.si_signo = SIGBUS;
> > > > > >> > +  info.si_errno = 0;
> > > > > >> > +  info.si_code = code;
> > > > > >> > +  info.si_addr = addr;
> > > > > >> > +  info.si_addr_lsb = lsb;
> > > > > >> > +  info.si_addr_top_byte = far >> 56;
> > > > > >> > +  info.si_addr_top_byte_mask = 0xff;
> > > > > >> > +  force_sig_info(&info);
> > > > > >> >  }
> > > > > >>
> > > > > >> I have a real problem with this construction.  force_sig_info is not an
> > > > > >> interface that should be used for anything except to define a wrapper
> > > > > >> that takes it's parameters.
> > > > > >
> > > > > > Can you elaborate?  How would you do this king of thing.
> > > > >
> > > > > There are no other uses of force_sig_info in architecture code.
> > > > >
> > > > > I just removed them _all_ because they were almost all broken.
> > > > > In fact your mcerr case is broken because it uses two different
> > > > > union members simultantiously.
> > > >
> > > > Is that really broken? I thought that the Linux kernel deliberately
> > > > didn't care about strict aliasing rules (the top-level Makefile passes
> > > > -fno-strict-aliasing) so I thought that it was valid in "Linux kernel
> > > > C" even though from a standards point of view it is invalid. (That
> > > > being said, this is probably moot with my proposed changes below
> > > > though.)
> > >
> > > I have a feeling that -fno-strict-aliasing only allows you to _read_ a
> > > different union member from the one previously written.
> > >
> > > Writing a different member from the last one written can still splatter
> > > on the other members IIUC.
> > >
> > > It would be better to keep things separate rather than risk
> > > incorrectness just to save a few bytes.
> > >
> > > IMHO -fno-strict-aliasing is no excuse for gratuitous type-punning.
> > >
> > > > > So I am looking for something like force_sig_mcerr or force_sig_fault
> > > > > that includes your new information that then calls force_sig_info.
> > > > >
> > > > > I know of no other way to safely use the siginfo struct.
> > > >
> > > > So you want something like:
> > > >
> > > > int force_sig_fault_with_ignored_bits(int signo, int code, void __user
> > > > *addr, uintptr_t addr_ignored, uintptr_t addr_ignored_mask);
> > > > int force_sig_mceerr_with_ignored_bits(int code, void __user *addr,
> > > > short lsb, uintptr_t addr_ignored, uintptr_t addr_ignored_mask);
> > > >
> > > > in kernel/signal.c and the code in arch/arm64 would call that?
> > > >
> > > > > > AIUI we absolutely need a forced signal here, we need to supply
> > > > > > metadata, and we don't have to open-code all that at every relevant
> > > > > > signal generation site...
> > > > > >
> > > > > >> It is not clear to me that if you have adapted siginfo_layout.
> > > > > >
> > > > > > Garbled sentence?
> > > > >
> > > > > Looks like.  One of the pieces of code that needs to change
> > > > > when siginfo gets updated is siginfo_layout so that the structure
> > > > > can be properly decoded and made sense of.
> > > > >
> > > > > I am not seeing anything like that.
> > > >
> > > > Okay, this has to do with copying between the compat and non-compat
> > > > versions of the struct? Sure, I can update that, although the code
> > > > would be basically non-functional on arm64 because TBI isn't supported
> > > > on 32-bit ARM.
> > > >
> > > > > >> > diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
> > > > > >> > index cb3d6c267181..6dd82373eb2d 100644
> > > > > >> > --- a/include/uapi/asm-generic/siginfo.h
> > > > > >> > +++ b/include/uapi/asm-generic/siginfo.h
> > > > > >> > @@ -91,6 +91,14 @@ union __sifields {
> > > > > >> >                            char _dummy_pkey[__ADDR_BND_PKEY_PAD];
> > > > > >> >                            __u32 _pkey;
> > > > > >> >                    } _addr_pkey;
> > > > > >> > +#ifdef __aarch64__
> > > > > >> > +                  /* used with all si_codes */
> > > > > >> > +                  struct {
> > > > > >> > +                          short _dummy_top_byte;
> > > > > >
> > > > > > ^ What's this for?  I don't have Eric's insight here.
> > > >
> > > > We would need a short's worth of padding in order to prevent the
> > > > fields from occupying the same address as si_addr_lsb.
> > > >
> > > > > >
> > > > > >> > +                          unsigned char _top_byte;
> > > > > >> > +                          unsigned char _top_byte_mask;
> > > > > >> > +                  } _addr_top_byte;
> > > > > >> > +#endif
> > > > > >> >            };
> > > > > >> >    } _sigfault;
> > > > > >> >
> > > > > >>
> > > > > >> Why the _dummy_top_byte?  Oh I see it should be spelled "short _addr_lsb;".
> > > > > >>
> > > > > >> Please remove the "#ifdef __aarch64__".  If at all possible we want to
> > > > > >> design this so any other architecture who has this challenge can use the
> > > > > >> code.  The kind of code does not get enough attention/maintenance if it
> > > > > >> is built for a single architecture.
> > > >
> > > > Seems reasonable. I was recently made aware that RISC-V was
> > > > considering a similar feature:
> > > > https://lists.riscv.org/g/tech-tee/topic/risc_v_tbi_proposal/72855478
> > > > I would have opted to expand this to other architectures on an
> > > > as-needed basis, but I'd also be fine with having it on all
> > > > architectures from the start.
> > > >
> > > > If we make this arch-independent, we have an additional concern, which
> > > > is "what if some future architecture wants more than one byte here?"
> > > > For example, an architecture may have a "top-two-bytes-ignore"
> > > > feature, which would imply two-byte (misnamed) "si_addr_top_byte" and
> > > > "si_addr_top_byte_mask" fields. And the RISC-V proposal potentially
> > > > implies many more ignored bits (see slide 13 of the presentation). The
> > > > maximum size that these fields can possibly be is the size of a
> > > > pointer, and with that there wouldn't be enough room in the padding at
> > > > this point to accommodate the new fields.
> > > >
> > > > That basically implies your earlier suggestion of adding a union
> > > > member here to accommodate future expansion of the union, and adding
> > > > the new fields after the union. I'm happy to make that change, with
> > > > the fields renamed "si_addr_ignored" and "si_addr_ignored_mask".
> > >
> > > I think what we need here is basically a flags word.
> > >
> > > So long as we keep a flag spare to indicate the existence of a further
> > > flags word, we can extend as needed.
> > >
> > > How the existence of the first flags words is detected is another
> > > problem.  If it only applies for newly-defined si_code values, then
> > > I guess si_code may be sufficient.
> >
> > Existing kernels will zero-initialize unused regions of the siginfo
> > data structure. The zero-initialization of the padding at the end of
> > the struct is done by the clear_user call here:
> > https://github.com/torvalds/linux/blob/3e08a95294a4fb3702bb3d35ed08028433c37fe6/kernel/signal.c#L3193
> >
> > and the zero-initialization of the padding between fields and unused
> > union members is done by the clear_siginfo function which the kernel
> > calls when initializing the data structure:
> > https://github.com/torvalds/linux/blob/3e08a95294a4fb3702bb3d35ed08028433c37fe6/include/linux/signal.h#L20
> >
> > Therefore, a flag word value of 0 may be used to detect a lack of
> > support for flagged fields.
>
> It's not enough that we do this today.  We would have had to do it back
> to the dawn of time (though in the arm64 case I guess we just need to go
> back to when the arch/arm64 was merged).
>
> v2.6.12:kernel/signal.c:copy_siginfo_to_user() suggests that this wasn't
> always the case, so unused parts of siginfo could be full of old junk
> from the user stack, if the kernel is sufficiently old.
>
> If we're trying to do something generic that makes sense on all arches,
> this matters.  I may have misunderstood something about the code though.

Hmm, I think you're right. The current behavior was introduced by
commit c999b933faa5e281e3af2e110eccaf91698b0a81 which was first
released in 4.18. So if an application wants to be compatible with
pre-4.18 kernels then there would need to be some other way to
indicate that the fields are valid. Probably the simplest way would be
to have the application issue a uname(2) syscall and check the kernel
version before reading these fields. I have a couple of other ideas
that don't rely on version detection, if we'd prefer to avoid that.
(They are somewhat ugly, but our hand is forced by backwards
compatibility.)

One idea is to re-purpose the si_errno field as a flags field for
certain signal numbers. I checked a few kernel releases going back to
2.6.18 and it looks like the field is set to 0 except in the following
circumstances:
- sending a hardware breakpoint (SIGTRAP/TRAP_HWBKPT)
- seccomp failures (SIGSYS/SYS_SECCOMP)
- user-defined signal via kill_pid_usb_asyncio
- SIGSWI in 3.18 and before (code since removed)

It is also set to EFAULT for certain SIGSEGV/SEGV_MAPERR signals on
powerpc since commit c96c4436aba4c12f1f48369f2f90bc43e12fe36c, which
is currently unreleased. So if we wanted to go this route for SIGSEGV
we would need to stop the kernel from setting si_errno to EFAULT for
this signal before the 5.8 release.

Another idea was to have userspace set a flag in sa_flags when
registering a signal handler meaning "this signal handler requires
unknown siginfo fields to be zeroed", and have existing kernels reject
the syscall due to an unknown flag being set, but unfortunately this
won't work because existing kernels do not reject sigaction syscalls
with unknown flags set in sa_flags. A perhaps more radical idea in
this vein would be to claim some of the upper bits of the signal
number as flags that will cause the syscall to be rejected if set and
unknown to the kernel. Existing kernels (going back to at least
2.6.18) contain this code in do_sigaction:

        if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig)))
                return -EINVAL;

and vald_signal is defined as:

static inline int valid_signal(unsigned long sig)
{
        return sig <= _NSIG ? 1 : 0;
}

All architectures define _NSIG as a value <= 128, so they will reject
a signal number with any of bits 8-31 set. This means that we can use
any of those bits for mandatory flags. Most likely we could use bit 30
(expanding down as necessary), as it keeps the signal number positive
and permits future expansion of the signal number range.

Peter