[RFC PATCH v2 3/3] arm64: signal: Ensure si_code is valid for all fault signals
James Morse
james.morse at arm.com
Tue Feb 13 05:58:55 PST 2018
Hi Dave,
On 30/01/18 18:50, Dave Martin wrote:
> Currently, as reported by Eric, an invalid si_code value 0 is
> passed in many signals delivered to userspace in response to faults
> and other kernel errors. Typically 0 is passed when the fault is
> insufficiently diagnosable or when there does not appear to be any
> sensible alternative value to choose.
>
> This appears to violate POSIX, and is intuitively wrong for at
> least two reasons arising from the fact that 0 == SI_USER:
>
> 1) si_code is a union selector, and SI_USER (and si_code <= 0 in
> general) implies the existence of a different set of fields
> (siginfo._kill) from that which exists for a fault signal
> (siginfo._sigfault). However, the code raising the signal
> typically writes only the _sigfault fields, and the _kill
> fields make no sense in this case.
>
> Thus when userspace sees si_code == 0 (SI_USER) it may
> legitimately inspect fields in the inactive union member _kill
> and obtain garbage as a result.
>
> There appears to be software in the wild relying on this,
> albeit generally only for printing diagnostic messages.
>
> 2) Software that wants to be robust against spurious signals may
> discard signals where si_code == SI_USER (or <= 0), or may
> filter such signals based on the si_uid and si_pid fields of
> siginfo._sigkill. In the case of fault signals, this means
> that important (and usually fatal) error conditions may be
> silently ignored.
>
> In practice, many of the faults for which arm64 passes si_code == 0
> are undiagnosable conditions such as exceptions with syndrome
> values in ESR_ELx to which the architecture does not yet assign any
> meaning, or conditions indicative of a bug or error in the kernel
> or system and thus that are unrecoverable and should never occur in
> normal operation.
>
> The approach taken in this patch is to translate all such
> undiagnosable or "impossible" synchronous fault conditions to
> SIGKILL, since these are at least probably localisable to a single
> process. Some of these conditions should really result in a kernel
> panic, but due to the lack of diagnostic information it is
> difficult to be certain: this patch does not add any calls to
> panic(), but this could change later if justified.
>
> Although si_code will not reach userspace in the case of SIGKILL,
> it is still desirable to pass a nonzero value so that the common
> siginfo handling code can detect incorrect use of si_code == 0
> without false positives. In this case the si_code dependent
> siginfo fields will not be correctly initialised, but since they
> are not passed to userspace I deem this not to matter.
>
> A few faults can reasonably occur in realistic userspace scenarios,
> and _should_ raise a regular, handleable (but perhaps not
> ignorable/blockable) signal: for these, this patch attempts to
> choose a suitable standard si_code value for the raised signal in
> each case instead of 0.
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9b7f89d..4baa922 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
[..]
> + { do_sea, SIGKILL, SI_KERNEL, "level 0 (translation table walk)" },
> + { do_sea, SIGKILL, SI_KERNEL, "level 1 (translation table walk)" },
> + { do_sea, SIGKILL, SI_KERNEL, "level 2 (translation table walk)" },
> + { do_sea, SIGKILL, SI_KERNEL, "level 3 (translation table walk)" },
> + { do_sea, SIGBUS, BUS_OBJERR, "synchronous parity or ECC error" }, // Reserved when RAS is implemented
I agree the translation-table related external-aborts should end up with
SIGKILL: there is nothing user-space can do.
You use the fault_info table to vary the signal and si_code that should be used,
but do_mem_abort() only uses these if the fn returns an error. For do_sea(),
regardless of the values in this table SIGBUS will be generated as it always
returns 0.
> @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr,
struct pt_regs *regs)
>
> info.si_signo = SIGBUS;
> info.si_errno = 0;
> - info.si_code = 0;
> + info.si_code = BUS_OBJERR;
> if (esr & ESR_ELx_FnV)
> info.si_addr = NULL;
> else
do_sea() has the right fault_info entry to hand, so I think these need to change
to inf->sig and inf->code. (I assume its not valid to set si_addr for SIGKILL...)
Thanks,
James
More information about the linux-arm-kernel
mailing list