mainline boot: 64 boots: 62 pass, 2 fail (v3.16-rc1-2-gebe0618)

Tushar Behera trblinux at
Wed Jun 25 23:44:12 PDT 2014

On 06/26/2014 03:27 AM, Laura Abbott wrote:
> On 6/25/2014 5:13 AM, Tushar Behera wrote:
>> On 06/25/2014 03:59 AM, Laura Abbott wrote:
>>> On 6/24/2014 10:47 AM, Laura Abbott wrote:
>>>> On 6/23/2014 11:32 AM, Kevin Hilman wrote:
>>>>> On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera <trblinux at> wrote:
>>>>>> Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
>>>>>> On 06/19/2014 04:12 PM, Tushar Behera wrote:
>>>>>>> On 06/19/2014 03:02 PM, Tushar Behera wrote:
>>>>>>>> On 06/18/2014 09:22 AM, Kevin Hilman wrote:
>>>>>>>>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera <trblinux at> wrote:
>>>>>>>>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote:
>>>>>>>>>>> Sachin,
>>>>>>>>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot <khilman at> wrote:
>>>>>>>>>>>> Tree/Branch: mainline
>>>>>>>>>>>> Git describe: v3.16-rc1-2-gebe0618
>>>>>>>>>>>> Failed boot tests (console logs at the end)
>>>>>>>>>>>> ===========================================
>>>>>>>>>>>>      exynos5420-arndale-octa:     FAIL:    arm-exynos_defconfig
>>>>>>>>>>>>                 ste-snowball:     FAIL:    arm-u8500_defconfig
>>>>>>>>>>> FYI... these failures are getting more consistent on my octa board,
>>>>>>>>>>> but still not failing every time.
>>>>>>>>>>> Kevin
>>>>>>>>>> Hi Kevin,
>>>>>>>>>> Same here.
>>>>>>>>>> Observation: If you soft-reset the board (through the jumpers) after
>>>>>>>>>> getting this problem, the problem keeps repeating. But if you hard-reset
>>>>>>>>>> the board (by removing the power cord), the problem doesn't occur during
>>>>>>>>>> next iteration.
>>>>>>>>> I don't ever use the soft-reset, I only toggle the wall power.  I
>>>>>>>>> don't ever actually remove the power cord though, I'm using a
>>>>>>>>> USB-controlled relay to toggle the wall power.
>>>>>>>>> Kevin
>>>>>>>> Laura,
>>>>>>>> We are getting following kernel panic [1] (not always, but quite
>>>>>>>> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420)
>>>>>>>> board with upstream kernel. I haven't observed this issue with other
>>>>>>>> boards yet.
>>>>>>>> This issue is observed when I am booting with uImage + dtb (within
>>>>>>>> roughly ~10 iterations).
>>>>>>> Some more information:
>>>>>>> The boot logs are provided in pastebin, okay[2] and failed[3].
>>>>>>> In case of boot failures, I am getting a higher value for vm_total_pages
>>>>>>> (684424 in [3]). In case of successful boot on my board, it is always
>>>>>>> 521232 [2] on my board.
>>>>> I can confirm that reverting the "Get rid of meminfo" patch gets the
>>>>> Octa board booting reliably again for me also.
>>>>> In case it helps, some boot logs for failures from the last copule
>>>>> linux-next build/boot cycles can be seen here:
>>>> Sorry, I missed this yesterday. I'm going to take a look.
>>> Were all of 
>>> collected on the same type of board with the same amount of DRAM? I'm seeing a
>>> different amount of total pages across all those logs. All the logs have the
>>> same lowmem limit so it seems like the upper bound was being calculated
>>> incorrectly for passing to free_area_init_node. Nothing is immediately jumping
>>> out at me so can you boot up with a small debug patch?
>>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>>> index 659c75d..88eac1f 100644
>>> --- a/arch/arm/mm/init.c
>>> +++ b/arch/arm/mm/init.c
>>> @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>>>         unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
>>>         struct memblock_region *reg;
>>> +       pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
>>> +       __memblock_dump_all();
>>>         /*
>>>          * initialise the zones.
>>>          */
>>> It would be helpful to do this across a few bootups to see if the values are
>>> actually consistent. I'll keep looking in the meantime.
>>> Thanks,
>>> Laura
>> Thanks Laura for the pointer. In case of error, I am getting some random
>> memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory.
>> The issue seems to be from u-boot, where it is not updating the memory
>> subnode properly. I have got a fix for the u-boot, which I am testing
>> right now. I will update tomorrow after I do some more test.
> I'm concerned my change can stay as is if this is exposing an issue
> in u-boot. Asking people to change bootloaders rarely ends well. Can
> you elaborate on what u-boot is doing that would be exposing this
> issue?
> Thanks,
> Laura


Here is my assessment of the current situation.

*Bug in the u-boot*
Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the
core uses a global structure (gd->bd) to maintain the start and size of
individual banks. Depending on the revision of SoC used on the board,
the board file [1] updates the start/size for either 8 or 12 banks. In
case of current revision of Arndale-Octa boards, the board file always
updates start/size for 8 banks, leaving the start/size data for
remaining 4 banks uninitialized.

But the u-boot core[2] updates the value of all the 12 banks, thus
potentially updating invalid data for last 4 banks.

The issue can be fixed by resetting the start/size for unused memory
banks to 0/0.[3]

*Before migration to memblock*
The path for adding DRAM banks was done through [4]. For Exynos systems,
NR_BANKS was defined as 8. The initial check for rejecting any banks
beyond NR_BANKS was good enough for fixing this issue. The bootlog[5]
(with some debug messages) shows the invalid data, both in u-boot and
kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog.

*After migration to memblock*
Now that the memory banks are added through [6], all the memory banks
are getting updated unconditionally resulting in the panic.

IMO, the bug is in u-boot and we should fix that.


Tushar Behera

More information about the linux-arm-kernel mailing list