[From nobody Thu Jun 25 05:55:36 2020
Received: from mr85p00im-ztdg06011901.me.com ([17.58.23.198])
 by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux))
 id 1in4SX-00070w-NP
 for openwrt-devel@lists.openwrt.org; Thu, 02 Jan 2020 17:40:35 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=me.com; s=1a1hai;
 t=1577986829; bh=bezvlKC0ZpY2nVXp+MtdPVAtOl77hZAnQV83+p3rnEw=;
 h=Content-Type:Subject:From:Date:Message-Id:To;
 b=vNsgZ7S5vmuWBDBpYQgOBpqpaLu7kz5LuC46/JD/3/9IPtWRQb4HVLYUIZnst655m
 JvyanttCwjYS2yhHNfWuwi/DOs9GQZTBqAinCjHOsikUPW1eynGb99tSigEXWIy6QF
 7xawUG1QeylAN56SZNBMHL41IV7oSFn46IQ3WEBLfvEsmrAV1JQBi6VhifE9fI6ITA
 fLlZcDwaL+z9ZkKarbJ3/ZmxjjLkCJPs+6iI+azZg/bE4aTVThOKwkSUTSeAsIXVCN
 RXtWN9Uk62XabMnbNkzHk3/kNZxk6F8f/M5ulqwVW7Webcaj44VsYIv+2t3UxmcMOU
 jss1aEQn2tbPw==
Received: from mbp-2.lan (78-80-17-93.nat.epc.tmcz.cz [78.80.17.93])
 by mr85p00im-ztdg06011901.me.com (Postfix) with ESMTPSA id 5E54BA60DFF;
 Thu,  2 Jan 2020 17:40:23 +0000 (UTC)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.40.2.2.4\))
Subject: Re: [OpenWrt-Devel] Sysupgrade possibly broken in recent development
 snapshots: &quot;message&quot;: &quot;Firmware image couldn't be validated&quot;
From: =?utf-8?Q?Petr_Nov=C3=A1k?= &lt;petrn@me.com&gt;
In-Reply-To: &lt;60DBDE96-C4EB-42D9-8927-DF7771685F0A@volatilesystems.org&gt;
Date: Thu, 2 Jan 2020 18:40:18 +0100
Cc: openwrt-devel@lists.openwrt.org, Hannu Nyman &lt;hannu.nyman@iki.fi&gt;,
 =?utf-8?Q?Petr_=C5=A0tetiar?= &lt;ynezz@true.cz&gt;
Content-Transfer-Encoding: quoted-printable
Message-Id: &lt;5B5E9BA4-0BE8-4FDD-B089-35658B983306@me.com&gt;
References: &lt;20191231095801.GK70184@meh.true.cz&gt;
 &lt;46C7C775-CDBB-4E84-8C7F-A0F949F1F981@me.com&gt;
 &lt;20191231134925.GL70184@meh.true.cz&gt;
 &lt;C9B93DB4-A2CA-455B-8B4F-E7A23E34D141@me.com&gt;
 &lt;20200101124453.GM70184@meh.true.cz&gt;
 &lt;2DF80201-77E5-4301-9046-67165E5A8B9C@me.com&gt;
 &lt;20200101161447.GQ70184@meh.true.cz&gt;
 &lt;DC52BD3D-AB2B-426F-A184-C1F7664BB076@me.com&gt;
 &lt;20200101200825.GR70184@meh.true.cz&gt;
 &lt;C6E8AA31-AE61-40F5-881B-A69A2007272B@me.com&gt;
 &lt;20200101204630.GS70184@meh.true.cz&gt;
 &lt;5e23fc80-72c7-5fe8-cf61-1b1390844a64@iki.fi&gt;
 &lt;60DBDE96-C4EB-42D9-8927-DF7771685F0A@volatilesystems.org&gt;
To: Stijn Segers &lt;foss@volatilesystems.org&gt;
X-Mailer: Apple Mail (2.3608.40.2.2.4)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, ,
 definitions=2020-01-02_05:, , signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0
 malwarescore=0
 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 mlxscore=0
 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.0.1-1908290000 definitions=main-2001020149
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20200102_094033_786044_3092852E 
X-CRM114-Status: GOOD (  17.87  )
X-Spam-Score: -0.9 (/)
X-Spam-Report: SpamAssassin version 3.4.2 on bombadil.infradead.org summary:
 Content analysis details:   (-0.9 points)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -0.7 RCVD_IN_DNSWL_LOW      RBL: Sender listed at https://www.dnswl.org/,
 low trust [17.58.23.198 listed in list.dnswl.org]
 -0.0 SPF_PASS               SPF: sender matches SPF record
 0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail
 provider (petrn[at]me.com)
 0.0 SPF_HELO_NONE          SPF: HELO does not publish an SPF Record
 -0.1 DKIM_VALID_EF          Message has a valid DKIM or DK signature from
 envelope-from domain
 -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
 -0.1 DKIM_VALID_AU          Message has a valid DKIM or DK signature from
 author's domain
 0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not necessarily
 valid

Q: are all the platforms where this problem has been observed multi-core =
(like the RPi4 or mt7621) or has this ever been experienced on a single =
core system?

Was the Qemu test Petr S. has done been running a multi-core or single =
core emulation?

Petr



&gt; On 2 Jan 2020, at 18:36, Stijn Segers &lt;foss@volatilesystems.org&gt; =
wrote:
&gt;=20
&gt; Hannu Nyman &lt;hannu.nyman@iki.fi&gt; schreef op 2 januari 2020 16:48:08 =
CET:
&gt;&gt; Petr =C5=A0tetiar kirjoitti 1.1.2020 klo 22.46:
&gt;&gt;&gt; Petr Nov=C3=A1k &lt;petrn@me.com&gt; [2020-01-01 21:11:30]:
&gt;&gt;&gt;=20
&gt;&gt;&gt;&gt; But how come the workaround was to use an older libubox and ubus -
&gt;&gt; was there
&gt;&gt;&gt;&gt; any new check which was not there before?
&gt;&gt;&gt; I don't have definitive answer, as I would need RPi-4 (or any other
&gt;&gt; real
&gt;&gt;&gt; hardware with Cortex-A72 core) to find the actual bit in the libubox
&gt;&gt; which
&gt;&gt;&gt; caused this change in the behavior, but here is a part of the commit
&gt;&gt;&gt; description[1] which might help answering that:
&gt;&gt;&gt;=20
&gt;&gt;&gt;  It seems like the recent fixes in the libubox library, particulary
&gt;&gt; in
&gt;&gt;&gt;  the jshn sub-component (which empowers json_dump used in the shell
&gt;&gt;&gt;  script executed by the child process) made the execution somehow
&gt;&gt; faster,
&gt;&gt;&gt;  thus exposing this racy behaviour in the
&gt;&gt; validate_firmware_image_call at
&gt;&gt;&gt;  least on RPi-4 (Cortex-A72) target.
&gt;&gt;&gt;=20
&gt;&gt;&gt; As I was unable to trigger this issue even in the QEMU/Cortex-A72 I
&gt;&gt; assume,
&gt;&gt;&gt; that it was simply some kind of race, needed specific timing,
&gt;&gt; provided
&gt;&gt;&gt; preciously only by that RPi-4 hardware.
&gt;&gt;=20
&gt;&gt;=20
&gt;&gt; I think that there may have been an older race condition behaviour =
that
&gt;&gt; has=20
&gt;&gt; now just surfaced better with RPi4 after the recent changes. It has
&gt;&gt; earlier=20
&gt;&gt; manifested itself sometimes with some routers, but more rarely.
&gt;&gt;=20
&gt;&gt; I have seen an occasional failure of sysupgrade in one of my routers
&gt;&gt; since=20
&gt;&gt; October (ar71xx or ath79  / WNDR3700v2).
&gt;=20
&gt; I've seen the same multiple times on more than one mt7621 device and I =
opened  FS#2696 on this.
&gt;=20
&gt; Granted, not the most verbose bug report.
&gt;=20
&gt; Stijn
&gt;=20
&gt;=20
&gt;=20
&gt;=20
&gt;&gt; I wrote about that to the
&gt;&gt; mailing=20
&gt;&gt; list in November, although then I thought that it might be just a
&gt;&gt; &quot;force&quot;=20
&gt;&gt; option failure:
&gt;&gt; =
http://lists.infradead.org/pipermail/openwrt-devel/2019-November/019996.ht=
ml
&gt;&gt;=20
&gt;&gt; Others have seen that also, based on forum discussion:
&gt;&gt; https://forum.openwrt.org/t/build-for-wndr3700v1-v2-wndr3800/64/295
&gt;&gt;=20
&gt;&gt; Petr Novak describes similar thing as my error as: &quot;it does just =
reboot
&gt;&gt; but=20
&gt;&gt; does not flash anything.&quot;
&gt;&gt;=20
&gt;&gt; I have tried to debug that in my WNDR3800 that has serial console
&gt;&gt; connection,=20
&gt;&gt; but have not managed to produce the error in that 3800. With 3800 the=20=

&gt;&gt; sysupgrade has succeeded always. However, in my 3700v2 (that has
&gt;&gt; identical=20
&gt;&gt; hardware except the RAM size) on the other side of the building, I
&gt;&gt; still=20
&gt;&gt; occasionally see the behaviour of LuCI based sysupgrade starting ok,
&gt;&gt; but the=20
&gt;&gt; router booting back to the same firmware after an invisible error.
&gt;&gt; After that=20
&gt;&gt; reboot the next sysupgrade attempt via LuCI usually works quite ok.
&gt;&gt; (sounds=20
&gt;&gt; like a sysupgrade from a recently booted system usually works, but=20
&gt;&gt; sysupgrading a system after some runtime does sometimes not work.)
&gt;&gt;=20
&gt;&gt; I first thought that it was related to using force in the =
ar71xx/ath79
&gt;&gt; jump,=20
&gt;&gt; but it has been present in normal sysupgrades.
&gt;&gt;=20
&gt;&gt; Possibly a manifestation of the same race condition in=20
&gt;&gt; sysupgrade/procd/libubox, so hopefully your patches will fix also =
that.
&gt;&gt;=20
&gt;&gt;=20
&gt;&gt;=20
&gt;&gt; _______________________________________________
&gt;&gt; openwrt-devel mailing list
&gt;&gt; openwrt-devel@lists.openwrt.org
&gt;&gt; https://lists.openwrt.org/mailman/listinfo/openwrt-devel
&gt;=20


]