mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)

Kevin Hilman khilman at linaro.org
Thu Jun 26 07:59:19 PDT 2014


Hi Tushar,

> Here is my assessment of the current situation.

Thanks for digging into this and the detailed diagnosis.

> *Bug in the u-boot*
> Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the
> core uses a global structure (gd->bd) to maintain the start and size of
> individual banks. Depending on the revision of SoC used on the board,
> the board file [1] updates the start/size for either 8 or 12 banks. In
> case of current revision of Arndale-Octa boards, the board file always
> updates start/size for 8 banks, leaving the start/size data for
> remaining 4 banks uninitialized.
>
> But the u-boot core[2] updates the value of all the 12 banks, thus
> potentially updating invalid data for last 4 banks.
>
> The issue can be fixed by resetting the start/size for unused memory
> banks to 0/0.[3]
>
> *Before migration to memblock*
> The path for adding DRAM banks was done through [4]. For Exynos systems,
> NR_BANKS was defined as 8. The initial check for rejecting any banks
> beyond NR_BANKS was good enough for fixing this issue. The bootlog[5]
> (with some debug messages) shows the invalid data, both in u-boot and
> kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog.
>
> *After migration to memblock*
> Now that the memory banks are added through [6], all the memory banks
> are getting updated unconditionally resulting in the panic.
>
> IMO, the bug is in u-boot and we should fix that.

I agree that the u-boot bug needs to be fixed, and FWIW, I updated my
u-boot and haven't seen the boot failure yet after several boots with
next-20140625.

That being said, since it's not always feasible/practical to update
u-boot, and when it comes down to it, this is still a kernel
regression, we should also fix the kernel to sanity check the values
coming from u-boot, like it was doing before.

Could you (or Laura) come up with a way to recreate the sanity check
that was detecting this problem before and ignoring those banks?

Thanks,

Kevin



More information about the linux-arm-kernel mailing list