[PATCH v4 0/3] x86, apic, kexec: Add disable_cpu_apic kernel parameter
d.hatayama at jp.fujitsu.com
Sun Nov 10 23:49:41 EST 2013
(2013/11/07 4:02), jerry.hoemann at hp.com wrote:
> On Wed, Oct 23, 2013 at 12:01:18AM +0900, HATAYAMA Daisuke wrote:
>> This patch set is to allow kdump 2nd kernel to wake up multiple CPUs
>> even if 1st kernel crashs on some AP, a continueing work from:
>> [PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
>> In this version, basic design has changed. Now users need to figure
>> out initial APIC ID of BSP in the 1st kernel and configures kernel
>> parameter for the 2nd kernel manually using disable_cpu_apic kernel
>> parameter to be newly introduced in this patch set. This design is
>> more flexible than the previous version in that we no longer have to
>> rely on ACPI/MP table to get initial APIC ID of BSP.
>> Sorry, this patch set have not include in-source documentation
>> requested by Borislav Petkov yet, but I'll post it later separately,
>> which would be better to focus on documentation reviewing.
>> v3 => v4)
>> - Rebased on top of v3.12-rc6
>> - Basic design has been changed. Now users need to figure out initial
>> APIC ID of BSP in the 1st kernel and configures kernel parameter for
>> the 2nd kernel manually using disable_cpu_apic kernel parameter to
>> be newly introduced in this patch set. This design is more flexible
>> than the previous version in that we no longer have to rely on
>> ACPI/MP table to get initial APIC ID of BSP.
> I have back ported version 4 of this patch to both a 2.6.32 and 3.0.80
> based kernels and distros and tested on a prototype system. I have
> previously test version 1 & 3 as well.)
> The systems are configured to boot the capture kernel 8-way parallel.
> However, I am running makedumpfile single threaded.
> Panic is induced via "echo c > /proc/sysrq-trigger". This is done
> under various system loads and on random cpus. I have done over a
> thousand dumps total during this testing.
Thanks for your testing.
> I have seen no issues w/ the 3.0.80 dump testing on our proto.
> On the 2.6.32 testing on our proto, i have hit a low probability (< 5%)
> chance of the capture suffering a soft lockup hang during
> "Switching to clocksource hpet." I have not RCA'd this yet.
> Note, I have seen this issue on earlier version of the patch, so
> it is not specific to this version.
> I then tested the 2.6.32 port on a dl380. This worked without issue.
> Note, I have seen no issues related to this patch on our proto when
> booting the capture with a single processor.
> While I am still pursuing the issue of the 2.6.32 kernel on our proto,
> I believe this patch is good and should be accepted.
This seems there's something that depends on the system you used. But I
have never verified my patch set on 2.6.32-based kernel. I'll try to
do a similar test on some FJ systems.
The 2.6.32-based kernel you mean is one of the Longterm release kernels,
right? So, you used on the test the 2.6.32-based Longterm release kernel
with my v4 patch, right?
The root cause seems to have already been fixed on recent kernel since
you didn't see the bug on 3.0.80-based kernel, so I think binary search
would be useful.
More information about the kexec