[PATCH v2] bootchooser: honour reset source

Thu Oct 19 06:45:11 PDT 2023

Hello Sascha,

Am 04.10.23 um 10:19 schrieb Sascha Hauer:
> 
> I have a problem with this. When a system is interrupted with a power
> failure or by pushing the reset button during boot then we can't say
> that this boot was "good". We just don't know. All we can do is to
> ignore that last boot for the calculation of the remaining attempts.
> 
> The end result might implement your desired behaviour, but the code
> leading to that result doesn't seem logical.

I can see your point with this valid edge case. However, this leads me
to the question why we already have a variable "last_boot_successful",
as we indeed can never know whether a previous boot really was
successful: Even something like a RAUC mark-good service can set a wrong
"success flag" i.e. when a critical error happens after that service has
already run.

This brings me to the idea of simply "inverting" the logic, changing the
code of my v2 commit from:

   if (!last_boot_successful &&
           test_bit(type, &good_reset_src_type)) {
                   last_boot_successful = true;
   }

to basically:

   if (!test_bit(type, &BAD_reset_src_type)) {
                   bootchooser_target_set_attempts(bc->last_chosen, -1);
   }

In that case, we would only reset the attempts once we have determined
that no "bad reset" has happened. Of course, that requires to gather all
unwanted reset modes and to list them in the barebox env, but the effect
on the result would be the same and we could avoid thinking about
uncertain goodness.
Scenarios like the combination of bad reset and blackout would of course
still slip through, but to me it seems like a more logical approach.

I would then further suggest to rework the concept of the variable
"last_boot_successful", as it is currently not used anyway - other than
to be set to "false" by the original code. Maybe "bad_reset_assumed"?

> 
> Also we have this snippet:
> 
> 	if (test_bit(RESET_ATTEMPTS_POWER_ON, &reset_attempts) &&
> 	    reset_source_get() == RESET_POR && !attempts_resetted) {
> 		pr_info("Power-on Reset, resetting remaining attempts\n");
> 		bootchooser_reset_attempts(bc);
> 		attempts_resetted = 1;
> 	}
> 
> I'm not sure if that already implements your desired behaviour, but it
> at least overlaps with the case you are implementing. Would it be an
> option to extend the global.bootchooser.reset_attempts variable with a
> "reset" bit and adjust the above accordingly?

I thoroughly thought about this alternative. This might in general work
for my case, but I am not sure if the necessary changes to the code are
justified by the outcome:

We currently have the array "reset_attempts_names", which only holds the
entries "power-on" and "all-zero". As we want to be able to work with
any reset reason, we would have the respective array "reset_src_names"
(from include/reset_source.c/h) to be merged into that. As of my
understanding, this would either mean to

   - manually mirror all entries from "reset_src_names" into
     "reset_attempts_names" directly within the source code, or

   - memcpy "reset_src_names" into "reset_attempts_names" during
     bootchooser_init(), which would require to make
     the original "reset_attempts_names" not to be a const anymore.

The behaviour might also become inconsistent: As of now, "power-on" and
"all-zero" lead to ALL boot slots being reset to their default attempts
values.
In contrast to that, the evaluation good/bad reset is meant to only
affect the current active slot.
While it is possible to implement that, I don't think it is a good idea
to have the behaviour of which slots(s) will be reset depend on your
bitmask in such an intransparent way.

Any thoughts on your side?

Regards,
Holger

-- 
Pengutronix e.K.                         | Holger Assmann              |
Steuerwalder Str. 21                     | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686         | Fax:   +49-5121-206917-5555 |