[PATCH 2/2] ARM: i.MX: xload: consider ECC strength when reading page

Tue Jun 8 23:34:34 PDT 2021

On 8. 06. 21 14:38, Trent Piepho wrote:
> On Tue, Jun 8, 2021 at 12:23 AM Andrej Picej <andrej.picej at norik.com> wrote:
>> On 7. 06. 21 22:03, Trent Piepho wrote:
>>> On Mon, Jun 7, 2021 at 2:32 AM Andrej Picej <andrej.picej at norik.com> wrote:
>>>> - Samsung K9K8G08U0E (pagesize: 0x800, oobsize: 0x40)
>>>> - Winbond W29N08GVSIAA (pagesize: 0x800, oobsize: 0x40) and
>>>> - Spansion S34ML08G201FI00 (pagesize: 0x800, oobsize: 0x80).
>>>>
>>>> All NANDs having set ECC strength to 4 (13 bytes) despite Spansion NAND
>>>> chip supporting ECC strength of 9 (29 bytes).
> 
>> tool uses NAND settings used by eboot, which are hardcoded to fixed
>> pagesize of 0x800 bytes and oobsize of 0x40 bytes (8 ECC bits). If for
> 
> Ok, so 4 ecc bits was used for testing, but your actual use case is
> for flash that uses 8 bits when NAND has 128 OOB bytes, which the
> current code uses a value different than 8?  My calculation is that
> 0x800+0x80 would use 18 bit ECC.

Actually 8 ECC bits was used for testing. Maybe it was wrong that I 
named EccBlockNEccType (from i.MX 6Dual/6Quad Applications Processor 
Reference Manual) as ECC strength (in commit message) as it gets shifted 
to the left for one bit to get ECC size in bits. So yes, we agree, 8 bit 
ECC for 0x800+0x80 (4<<1 = 8) and 18 bit ECC for 0x800+0x80 (9<<1 = 18).

> 
> But really, the exact numbers don't matter.  Just that your nand flash
> tool, barebox xload, barebox main, uboot, uboot spl, linux, kobs-ng,
> etc. don't all agree on ECC values.
> 
>> I agree that it would be better to use all of the space available, but
>> if flasher used wrong settings to copy barebox binary to NAND these
>> settings (although not optimal) should be used to make booting even
>> possible.
> 
> But, how does one know 2nd stage barebox is flashed with the same ECC
> as 1st stage xload?  See below.
> 
>>
>> The main reason why I think we should use FCB here for this is because
>> i.MX6's ROM already uses these values for booting into pre-bootloader.
>> That's why we try to act in xloader like ROM does (reading NAND
>> parameters from FCB). Nevertheless flasher tools should be responsible
>> to match the BCH ECC page with what it is written into FCB. If that is
> 
> I think it's fair to assume that the barebox xload is using the ECC
> from the FCB, otherwise it would not boot.  But does barebox 2nd stage
> use same ECC as xload?  In your case, the answer is currently yes.
> But is this always the case?
> 
> I don't know of a specific board where it is not, but I do know this:
> It is common that a Linux based software update system will not update
> the bootloader.  It might just do rootfs, or rootfs+kernel, but
> bootloader is less common.  In two a stage system, xload + main, maybe
> the xload is not updated.  It is a pain from Linux, with different
> versions of kobs and/or kobs-ng, which are poorly maintained and
> documented, a special attribute in sysfs that old Freescale kernels
> had and that isn't around anymore that is sometimes needed and
> sometimes not, etc.  And as I have just discovered, iMX6UL and iMX6ULL
> use a different encoding of FCB that all other iMX and of course some
> kobs-ng versions don't know this and create a broken FCB.
> 
> I even made a system that did this: barebox-xload had A/B support for
> 2nd stage and 2nd stage was updated, but the xload wasn't, since it
> wasn't fail-safe.  But this was for CycloneV and doesn't apply here.
> 
> So, suppose we have updated barebox 2nd stage from Linux (or barebox)?
>   Now it uses "common" ECC values (IMHO, "optimal" is not an accurate
> term here) from Linux kernel.  Barebox-xload current works to boot
> this, but your change will break that.

OK, I see. This is a valid point. Didn't really understand that updating 
only 2nd stage barebox is a common practice. Do you know of any imx6 
board that does that, because this xloader is imx6 specific?

> 
> It is a difficult problem, either choice of a ECC values could be the
> correct one.

Yes I agree, either way we break booting in one of our use cases. In my 
case pre-bootloader wouldn't get correctly read and in your case main 
bootloader wouldn't get correctly read.

> 
>> In our case the described proprietary flasher tool only flashes barebox
>> so only NAND pages with barebox binary are using not optimal ECC
>> settings. If for example kernel, devicetree and rootfs would be flashed
>> from barebox the NAND pages there would use correct ECC size and booting
>> into linux and updating those NAND pages from linux works. Updating
>> barebox from barebox itself (using barebox_update) would mean that the
>> barebox binary will be overwritten in NAND with optimal ECC settings and
>> FCB will be updated accordingly.
> 
> Does barebox_update run in 2nd stage barebox update both 2nd stage
> barebox and barebox-xload + FCB?

Yes, it does.

> 
> Consider what happens if barebox 2nd stage is updated from Linux.
> Usually software update systems run on Linux, e.g. rauc or mender.  In
> this case it will use Linux ECC settings, not FCB settings.
> 
> You've got boards with barebox-xload and barebox using different ECC
> settings than kernel and rootfs.  And not just two different settings,
> but also 2nd stage barebox and Linux don't know this.  I predict this
> will be a source of much future pain.
> 
> 
>> We are only using this ECC values to read barebox binary from NAND and
>> copy it to RAM. If other NAND pages will be using different ECC values
>> that doesn't break anything, I think. Only problem that I can see here
>> is barebox or linux reading NAND pages occupied by barebox binary, this
>> will most likely fail, but I don't see why that would be necessary anyway.
>>
>> I don't think we are braking anything here, we are just fixing booting
>> barebox from NAND whit not optimal ECC settings.
>>
>> Please correct me if I'm wrong or if I'm missing something here?
> 
> You've got ECC settings for:
> (xload barebox) (kernel rootfs)
> But if someone had this:
> (xload) (barebox kernel rootfs)
> Then it breaks.

Yes I agree, as i already wrote above, I didn't know this is common way 
of doing bootloader update.

> 
> Why would they have that?  As I describe above, everything in the 2nd
> set is updated from Linux using some software update system.
> 
> Of course, the most common way is this:
> (xload barebox kernel rootfs)
> 
> With just one set, when the xload has two choices, FCB vs common
> values, both are the same, so even if barebox is updated from Linux it
> still works.
> 
> A solution that works for boths cases, but is also ugly and difficult,
> is to try both.  If xload sees FCB values != calculated values, then
> just try both settings.  One is virtually assured that the incorrect
> settings will produce massive numbers of errors from BCH.  Read a
> couple pages and the settings which result in uncorrectable ECC errors
> on all pages are the wrong ones.
> 

Yes that would be an ugly fix for this.

But I see one problem. If different ECC values are used for 
pre-bootloader and main bootloader (like it is the case in example that 
you provided) we would have to read pre-bootloader and main bootloader 
with different ECC settings.

So the xload would look something like:
- read a couple of pages from pre-bootloader and select appropriate 
"readtotal_pbl"
- copy pre-bootloader to RAM with selected "readtotal_pbl"
- read a couple of pages from main-bootloader and select appropriate 
"readtotal_main"
- copy the remaining pages (main barebox) to RAM with selected 
"readtotal_main"

Now for this we would need to find out where PBL ends and main barebox 
starts (probably from boot data?).

This would solve all of the problems right?

But is this all needed for such extreme use case?

As I said, I don't know how common it is for user to update only 2nd 
stage barebox, and how common it is to use flasher tools which would use 
different ECC settings than barebox and kernel for example. Both of 
these are needed to get ECC mismatch. And I can't think of other cases 
where a mismatch between ECC settings between pre-bootloader and 2nd 
stage barebox would happen.

BR,
Andrej