Best practices for shell libraries (specifically error-checking)

Mon Nov 20 07:03:31 PST 2023

On 2023-11-18 20:42, Philip Prindeville wrote:
> Hi all,
> 
> I'm trying to figure out what best practices are for shell libraries, as I'm working on changes to a pretty significant library which I envision being leveraged in a lot of places.
> 
> My questions are these:
> 
> * should a good library do error-checking for the caller, or expect the caller to have done his own validation? Given that most exploits are the result in insufficient input validation (more than 60% according to a study by the SEI), it seems that the library should do it.  Also, it's one-and-done, rather than duplicating error-checking in all the places that the library might be used.  And if a missing check is uncovered later?  I'd rather have one place to patch it than many.
> 

Error checking is best done in a single location, where it is relevant 
to do so. Always check input.

> * in my case, the library has both predicate tests that do validation and their only output is a true/false return value, as well as functions that perform a calculation on their inputs and echo an output to stdout.  So there's a mix of:
> 
>      if ! is_validity_check "$myarg"; then
>          # do some stuff
>      fi
> 
> as well as:
> 
>      var="$(compute_something "$n" "$m")"
> 
> and it's this later that is a little more thorny. The issue is that we're capturing itss output, but if it's also performing error-checking, then this will happen in a sub shell and the only way to indicate failure is either to generate no output, or to check the exit value of the sub shell.  So we can do:
> 
>      var="$(compute_something "$n" "$m")"
>      if [ -z "$var" ]; then
>          # handle the error
>      fi
> 
> or else:
> 
>      var="$(compute_something "$n" "$m")" || { # handle the error ; }
> 
> And I'm inclined to go with the later, it's less wordy and the handling can often be something like a function that takes an argument which it then echos to stderr, and then exits non-zero.  Easy-peasy and simple to read and understand.
> 
> The hard part is when this function that writes its result to standard output is being used in a different, uncommon way that things get complicate.  Like:
> 
>      ( compute_something "$n" "$m" >> something.conf ) || { # handle the error ; }
> 
> where they aren't invoking the function via a sub shell unless they do so explicitly (and they might not have read the functions or accompanying comments and be aware of how to use them safely).
> 
> * so then there's a third option.  Always return a true/false status code, don't emit anything to standard output, and just set some unlikely global variables like __compute_something_ret_val1, etc.  Ugly, but potentially less disastrous if not invoked properly... but the flip-side is that the user might not check return values and continue on his bliss, perilous way.  It also avoids fork/exec pairs and everything runs in a single process, so that's nice (as an aside, an enhancement to shell might have been to have $(...) run in a separate thread if it only used built-ins, and you could still use a pipe to capture its output).
> 
> Yeah, shell is a 40+ year old language that we're asking to do more than it was ever intended to do.  It would have been nice if in the last 20 years someone had come up with a way to add new built-in functions via .so plugins that provided language extensions.  But they didn't and that's not on option.
> 
> What is the collective wisdom?  Coming from a security coding background, I prefer to make things blow up on invalid input unless the user (caller) takes steps to recover meaningfully from that failure.

Blow up: This flags that an error potentially exists and should be 
fixed. Users describe this as a 'bug', even if it's better to fail 
closed than fail open. We typically do this in code with assert. The 
thing I dislike about this is that dependent processes may hang or crash 
and the user is none the wiser.

I would much prefer, however, that a meaningful, and helpful error 
message is emitted so that a user could manually recover from such an 
error or present a coherent bug report.

If possible, use (unique) return codes, and not just true/false. That's 
what libraries do.

If one cannot use unique return codes, until it's true, it's always 
false. Or in other words, lack of yes or no is still a no (fail closed).

Multiple return paths can ensure the caller depends on (okay, coupling, 
bad) specific values instead of simply true or false.