[PATCH v2] ARM: kexec: Use the right ISA for relocate_new_kernel

Fri Nov 15 13:11:43 EST 2013

On 11/15/2013 07:38 PM, Dave Martin wrote:
> On Fri, Nov 15, 2013 at 01:28:21PM +0200, Taras Kondratiuk wrote:
>> And the issue I'm frequently facing in reloaded kernel (Thumb from ARM)
>> is random crashes caused by undefined instructions.
>>
>> My observation summary:
>> - Before starting a second kernel I'm dumping loaded zImage and then
>>   unpacked Image at final location and they are correct, so no issue
>>   with loading.
>> - I observe two types of crash:
>>   1) Undefined instruction in the middle of kernel code. After a crash
>>      I check failing address and there is always a *valid* Thumb
>>      instruction (CPU is in Thumb mode).
>>   2) Jump to a wrong address which consequently causes undefined
>>      instruction exception. A trace of one example of a wrong jump is
>>      captured in [1]. Instead of jumping to 0xC049097C code gets
>>      executed at 0xED85E008. BTW the wrong address suspiciously looks
>>      like an ARM instruction.
> 
> That jump to 0xED85E008 certainly looks strange ... I wonder whether
> there could be some instructions missing from the trace.
> 
> 
> How early do these crashes happen?

At very early stages starting from setup_arch() up to early initcalls.

> Is this happening on SMP, and if so, what is the state of secondary
> CPUs across kexec?

I have disabled CONFIG_SMP. Second CPU is busy-looping in ROM code and
shouldn't cause any issues.

> If secondary CPUs are not safely parked, or their caches are not drained
> before the kexec occurs, this can cause corruption of the new kernel
> or unpredictable behaviour of the secondary CPUs.
> 
>> - If second kernel is placed at different address (like in kdump case),
>>   then it boots fine and I don't observe any crashes.
>> - If I check failing address in the first kernel (ARM) the code there
>>   is really undefined instruction if executed as Thumb.
>> - Looks like pieces of old ARM kernel gets executed instead of new
>>   Thumb kernel. But as I've mentioned I'm reading physical memory via
>>   JTAG before starting second kernel and memory is matching a compiled
>>   Thumb 'Image'. Icache also gets cleaned...
>> - Once when stopped on breakpoint I've seen a piece of ARM code in
>>   Thumb kernel. Interesting that I was looking at the same memory
> 
> Thumb kernels do contain a small amount of ARM code, in the vectors
> page for example.  But it's possible you were also looking at stale
> data.

Right, but I mean there was an ARM code in place where definitely a
Thumb code should be.

> 
>>   location via physical and virtual addresses simultaneously and only
>>   virtual address showed an old code. After a few memory browsing
> 
> It's possible that those views could be inconsistent either due to
> the behaviour of the debugger, or because inconsistent memory types
> are used to construct the two views.
> 
>>   operations, data at both addresses got synced to correct Thumb code.
>>   Sure it could be a debugger lag, but it fits nicely with other
>>   observations.
>>
>> Do you have some ideas what could cause such behavior?
> 
> Not really, apart from the above ideas.
> 
>>
>> Unfortunately I don't have more time now to debug it further,
>> but I will try to return to this later.
> 
> OK ... let me know if you see this again or get any more clues.
> 
> Cheers
> ---Dave
> 
>>
>> [1]
>> https://drive.google.com/file/d/0ByfnRzd5ZYtdQWJKc1k0VmxrZlE/edit?usp=sharing
>>
>> -- 
>> Taras Kondratiuk

-- 
Taras Kondratiuk