[PATCH v2] ARM: kexec: Use the right ISA for relocate_new_kernel
Taras Kondratiuk
taras.kondratiuk at linaro.org
Fri Nov 15 13:11:43 EST 2013
On 11/15/2013 07:38 PM, Dave Martin wrote:
> On Fri, Nov 15, 2013 at 01:28:21PM +0200, Taras Kondratiuk wrote:
>> And the issue I'm frequently facing in reloaded kernel (Thumb from ARM)
>> is random crashes caused by undefined instructions.
>>
>> My observation summary:
>> - Before starting a second kernel I'm dumping loaded zImage and then
>> unpacked Image at final location and they are correct, so no issue
>> with loading.
>> - I observe two types of crash:
>> 1) Undefined instruction in the middle of kernel code. After a crash
>> I check failing address and there is always a *valid* Thumb
>> instruction (CPU is in Thumb mode).
>> 2) Jump to a wrong address which consequently causes undefined
>> instruction exception. A trace of one example of a wrong jump is
>> captured in [1]. Instead of jumping to 0xC049097C code gets
>> executed at 0xED85E008. BTW the wrong address suspiciously looks
>> like an ARM instruction.
>
> That jump to 0xED85E008 certainly looks strange ... I wonder whether
> there could be some instructions missing from the trace.
>
>
> How early do these crashes happen?
At very early stages starting from setup_arch() up to early initcalls.
> Is this happening on SMP, and if so, what is the state of secondary
> CPUs across kexec?
I have disabled CONFIG_SMP. Second CPU is busy-looping in ROM code and
shouldn't cause any issues.
> If secondary CPUs are not safely parked, or their caches are not drained
> before the kexec occurs, this can cause corruption of the new kernel
> or unpredictable behaviour of the secondary CPUs.
>
>> - If second kernel is placed at different address (like in kdump case),
>> then it boots fine and I don't observe any crashes.
>> - If I check failing address in the first kernel (ARM) the code there
>> is really undefined instruction if executed as Thumb.
>> - Looks like pieces of old ARM kernel gets executed instead of new
>> Thumb kernel. But as I've mentioned I'm reading physical memory via
>> JTAG before starting second kernel and memory is matching a compiled
>> Thumb 'Image'. Icache also gets cleaned...
>> - Once when stopped on breakpoint I've seen a piece of ARM code in
>> Thumb kernel. Interesting that I was looking at the same memory
>
> Thumb kernels do contain a small amount of ARM code, in the vectors
> page for example. But it's possible you were also looking at stale
> data.
Right, but I mean there was an ARM code in place where definitely a
Thumb code should be.
>
>> location via physical and virtual addresses simultaneously and only
>> virtual address showed an old code. After a few memory browsing
>
> It's possible that those views could be inconsistent either due to
> the behaviour of the debugger, or because inconsistent memory types
> are used to construct the two views.
>
>> operations, data at both addresses got synced to correct Thumb code.
>> Sure it could be a debugger lag, but it fits nicely with other
>> observations.
>>
>> Do you have some ideas what could cause such behavior?
>
> Not really, apart from the above ideas.
>
>>
>> Unfortunately I don't have more time now to debug it further,
>> but I will try to return to this later.
>
> OK ... let me know if you see this again or get any more clues.
>
> Cheers
> ---Dave
>
>>
>> [1]
>> https://drive.google.com/file/d/0ByfnRzd5ZYtdQWJKc1k0VmxrZlE/edit?usp=sharing
>>
>> --
>> Taras Kondratiuk
--
Taras Kondratiuk
More information about the linux-arm-kernel
mailing list