[PATCH 2/2] ARM: i.MX: xload: consider ECC strength when reading page

Tue Jun 8 05:38:48 PDT 2021

On Tue, Jun 8, 2021 at 12:23 AM Andrej Picej <andrej.picej at norik.com> wrote:
> On 7. 06. 21 22:03, Trent Piepho wrote:
> > On Mon, Jun 7, 2021 at 2:32 AM Andrej Picej <andrej.picej at norik.com> wrote:
> >> - Samsung K9K8G08U0E (pagesize: 0x800, oobsize: 0x40)
> >> - Winbond W29N08GVSIAA (pagesize: 0x800, oobsize: 0x40) and
> >> - Spansion S34ML08G201FI00 (pagesize: 0x800, oobsize: 0x80).
> >>
> >> All NANDs having set ECC strength to 4 (13 bytes) despite Spansion NAND
> >> chip supporting ECC strength of 9 (29 bytes).

> tool uses NAND settings used by eboot, which are hardcoded to fixed
> pagesize of 0x800 bytes and oobsize of 0x40 bytes (8 ECC bits). If for

Ok, so 4 ecc bits was used for testing, but your actual use case is
for flash that uses 8 bits when NAND has 128 OOB bytes, which the
current code uses a value different than 8?  My calculation is that
0x800+0x80 would use 18 bit ECC.

But really, the exact numbers don't matter.  Just that your nand flash
tool, barebox xload, barebox main, uboot, uboot spl, linux, kobs-ng,
etc. don't all agree on ECC values.

> I agree that it would be better to use all of the space available, but
> if flasher used wrong settings to copy barebox binary to NAND these
> settings (although not optimal) should be used to make booting even
> possible.

But, how does one know 2nd stage barebox is flashed with the same ECC
as 1st stage xload?  See below.

>
> The main reason why I think we should use FCB here for this is because
> i.MX6's ROM already uses these values for booting into pre-bootloader.
> That's why we try to act in xloader like ROM does (reading NAND
> parameters from FCB). Nevertheless flasher tools should be responsible
> to match the BCH ECC page with what it is written into FCB. If that is

I think it's fair to assume that the barebox xload is using the ECC
from the FCB, otherwise it would not boot.  But does barebox 2nd stage
use same ECC as xload?  In your case, the answer is currently yes.
But is this always the case?

I don't know of a specific board where it is not, but I do know this:
It is common that a Linux based software update system will not update
the bootloader.  It might just do rootfs, or rootfs+kernel, but
bootloader is less common.  In two a stage system, xload + main, maybe
the xload is not updated.  It is a pain from Linux, with different
versions of kobs and/or kobs-ng, which are poorly maintained and
documented, a special attribute in sysfs that old Freescale kernels
had and that isn't around anymore that is sometimes needed and
sometimes not, etc.  And as I have just discovered, iMX6UL and iMX6ULL
use a different encoding of FCB that all other iMX and of course some
kobs-ng versions don't know this and create a broken FCB.

I even made a system that did this: barebox-xload had A/B support for
2nd stage and 2nd stage was updated, but the xload wasn't, since it
wasn't fail-safe.  But this was for CycloneV and doesn't apply here.

So, suppose we have updated barebox 2nd stage from Linux (or barebox)?
 Now it uses "common" ECC values (IMHO, "optimal" is not an accurate
term here) from Linux kernel.  Barebox-xload current works to boot
this, but your change will break that.

It is a difficult problem, either choice of a ECC values could be the
correct one.

> In our case the described proprietary flasher tool only flashes barebox
> so only NAND pages with barebox binary are using not optimal ECC
> settings. If for example kernel, devicetree and rootfs would be flashed
> from barebox the NAND pages there would use correct ECC size and booting
> into linux and updating those NAND pages from linux works. Updating
> barebox from barebox itself (using barebox_update) would mean that the
> barebox binary will be overwritten in NAND with optimal ECC settings and
> FCB will be updated accordingly.

Does barebox_update run in 2nd stage barebox update both 2nd stage
barebox and barebox-xload + FCB?

Consider what happens if barebox 2nd stage is updated from Linux.
Usually software update systems run on Linux, e.g. rauc or mender.  In
this case it will use Linux ECC settings, not FCB settings.

You've got boards with barebox-xload and barebox using different ECC
settings than kernel and rootfs.  And not just two different settings,
but also 2nd stage barebox and Linux don't know this.  I predict this
will be a source of much future pain.

> We are only using this ECC values to read barebox binary from NAND and
> copy it to RAM. If other NAND pages will be using different ECC values
> that doesn't break anything, I think. Only problem that I can see here
> is barebox or linux reading NAND pages occupied by barebox binary, this
> will most likely fail, but I don't see why that would be necessary anyway.
>
> I don't think we are braking anything here, we are just fixing booting
> barebox from NAND whit not optimal ECC settings.
>
> Please correct me if I'm wrong or if I'm missing something here?

You've got ECC settings for:
(xload barebox) (kernel rootfs)
But if someone had this:
(xload) (barebox kernel rootfs)
Then it breaks.

Why would they have that?  As I describe above, everything in the 2nd
set is updated from Linux using some software update system.

Of course, the most common way is this:
(xload barebox kernel rootfs)

With just one set, when the xload has two choices, FCB vs common
values, both are the same, so even if barebox is updated from Linux it
still works.

A solution that works for boths cases, but is also ugly and difficult,
is to try both.  If xload sees FCB values != calculated values, then
just try both settings.  One is virtually assured that the incorrect
settings will produce massive numbers of errors from BCH.  Read a
couple pages and the settings which result in uncorrectable ECC errors
on all pages are the wrong ones.