Random stack corruption on v5.13 with dra76

Tomi Valkeinen tomi.valkeinen at ideasonboard.com
Fri May 21 01:45:29 PDT 2021


On 21/05/2021 10:39, Tony Lindgren wrote:
> * Tomi Valkeinen <tomi.valkeinen at ideasonboard.com> [210521 07:05]:
>> On 21/05/2021 08:36, Tony Lindgren wrote:
>>> * Tomi Valkeinen <tomi.valkeinen at ideasonboard.com> [210520 08:27]:
>>>> Hi,
>>>>
>>>> I've noticed that the v5.13 rcs crash randomly (but quite often) on dra76 evm
>>>> (I haven't tested other boards). Anyone else seen this problem?
>>>
>>> I have not seen this so far and beagle-x15 is behaving for me.
>>>
>>> Does it always happen on boot?
>>
>> No, but quite often. I can't really say how often, as it's annoyingly random.
>> I tried to bisect, but that proved to be difficult as sometimes I get multiple (5+)
>> successful boots before the crash.
>>
>> I tested with x15, same issue (below). So... Something in my kernel config? Or compiler?
>> Looks like the crash happens always very soon after (or during) probing palmas.
> 
> After about 10 reboots with your .config I'm seeing it now too on
> beagle-x15. So far no luck reproducing it with omap2plus_defconfig.

I think I have an easy way to see if a kernel is good or bad, by 
printing stack_not_used(current) in the first call to 
omap_i2c_xfer_irq(). There's a huge drop between v5.12 and v5.13-rc1.

And interestingly, sometimes a simple printk seems to use hundreds of 
bytes of stack (i.e. compare stack usage before and after the print). 
But not always. So maybe the issue is somehow related to printk.

I'm bisecting.

  Tomi



More information about the linux-arm-kernel mailing list