Reset on Beaglebone Black has become unreliable/broken

Konstantin Kletschke konstantin.kletschke at inside-m2m.de
Thu Nov 28 01:46:10 PST 2024


On Thu, Nov 28, 2024 at 10:23:10AM +0100, Ahmad Fatoum wrote:

> I assume this should be v2022.04? -dirty means you have local patches
> on top. Do any of them touch SoC-specific, board-specific parts
> like clock or power?

Yes, it is "barebox 2022.04.0-dirty #1 Tue Sep 10 08:45:54 UTC 2024".
The patches we apply do not touch any clock or power, we touch:
Environment, kernel cmdline, watchdog settings, bootchooser config, 
autoabortkey. Config stuff.

> What changed over the last week on the software side? I understand barebox
> stayed the same? Is the kernel still the same?

We changed nothing. I use to ship this barebox version with kernel for a
couple of months. Last week we only ramped up quantity but the fails are
so high in percentage it should had happened a couple of times before.

> On affected hardware: Does this happen always or only some times?

Always. Easy reproducable.
Meanwhile I realized on affected BBBs it can be reproduced this way:

Boot, hit Ctrl-C to stop barebox at prompt.
Hit S1 button which is wired to NRESET_INOUT ball A10 (its not S2 as I
initially wrote, S1).
System is stuck/frozen/dead.

> This sounds very similar to the issue fixed in commit 9c1a78f959dd
> ("Revert "ARM: beaglebone: init MPU speed to 800Mhz""), but that's already
> included in v2022.04.0, hence the question if you have patches that
> do anything similar.

Sounds interesting, I will take a look. As said, we patch no clock
voltages or something like that.

> Yes, but it sounds strange that only now these problems pop up?

Yes. Last week we started to experience this problem in production, we
have ~200 working BBBs, ~20 have this problem. The batch worked
flawlessly but suddenly a couple of broken BBBs kinda heaped one day,
now sometimes this happens.

I am even not so shure if software is to blame or if the hardware is or
has become glitchy, but falsinh stock u-boot still is able to
reset/restart on its own on these devices.

> Besides checking what changed, you should check if Linux is playing
> around with the voltages powering the SoC and if it does, disable that
> to see if it improves the situation.

Sadly (or gladly?) linux is not involved on affected BBBs. Boot, stop in
bootloader, hit S1, system freezes.

> Your barebox restart handler is probably am33xx_restart_soc (named
> "soc" in reset -l output).

I will poke around, never in my life was dealing with reset code :-)

Regards
Konsti


-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke at inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




More information about the barebox mailing list