kexec crash kernel boot failure on arm64
Anurup m
anurup.m at huawei.com
Thu Jul 23 19:07:24 PDT 2015
Hi All,
There is a problem observed with crash kernel boot in kdump on arm64.
On arm64 hardware board, when I enable the purgatory segment, the crash kernel doesnot boot.
When checked with trace32, it is observed that the control comes to purgatory_start routine,
but the instructions are seen as UNDEF and the boot hangs. But when I took the memory dump, the
contents were seen as proper(matching with the purgatory_start code).
I did some experiments to analyze this issue. Tried changing the Load order of kexec segments and
observed results as below
------------------------------------------------------------------------------
Segments Load order crash kernel boot status
-------------------- -------------------------
1) crash kernel, initrd, dtb. Elfcorehdr - Boot Success - without purgatory
2) crash kernel, initrd, dtb. Purgatory, elfcorehdr - HUNG as control does not reach purgatory segment.
3) crash kernel, elfcorehdr, purgatory, dtb, initrd - Boot Success
4) crash kernel, initrd, dtb, purgatory, elfcorehdr, - Boot Success
an extra segment(~20M)).
>From this I could infer that If I load a larger segment after purgatory (in the load order), the crash
Kernel boots. i.e. memory sync is taking some time.
So to clarify if memory sync is the Issue, I tried flush the data cache after writing the kexec segments.
kernel/kexec.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/kexec.c b/kernel/kexec.c index 7bb25f0..ca36aa0 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1176,6 +1176,10 @@ static int kimage_load_crash_segment(struct kimage *image,
else
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
+ /* Flush Dcache to make sure it is push to DRAM
+ * This is added as workaround for crash kernel
+ * boot failure */
+ __flush_dcache_area((__force void *)ptr, uchunk);
kunmap(page);
if (result) {
result = -EFAULT;
With the above change, control could reach purgatory_start, but this time it loops due to sha256_digest
Verify failure. It is able to boot to crash kernel (after comment verify_sha256_digest)
What could be the possible reasons for this issue? Please share your comments.
Note: This issue does not occur in Foundation Model.
Thanks & Regards,
Anurup
More information about the linux-arm-kernel
mailing list