Regression: kexec/kdump boot hangs with x86/vector commits

Dave Young dyoung at redhat.com
Wed Jan 3 19:15:38 PST 2018


On 12/14/17 at 05:24pm, Dave Young wrote:
> On 12/13/17 at 11:57pm, Yu Chen wrote:
> > On Wed, Dec 13, 2017 at 10:52:56AM +0800, Dave Young wrote:
> > > Hi,
> > > 
> > > Kexec reboot and kdump has broken on my laptop for long time with
> > > 4.15.0-rc1+ kernels. With the patch below an early panic been fixed:
> > > https://patchwork.kernel.org/patch/10084289/
> > > 
> > > But still can not get a successful reboot, it looked like graphic
> > > issue, but after bisecting the kernel, I got below:
> > > 
> > > [dyoung at dhcp-*-* linux]$ git bisect good
> > > There are only 'skip'ped commits left to test.
> > > The first bad commit could be any of:
> > > 2db1f959d9dc16035f2eb44ed5fdb2789b754d6a
> > > 4900be83602b6be07366d3e69f756c1959f4169a
> > > We cannot bisect more!
> > > 
> > > These two commits can no be reverted because of code conflicts, thus
> > > I reverted the whole series from Thomas (below commits), with those
> > > x86/vector changes reverted, kexec reboot works fine.
> > > 
> > > Could you help to take a look, any thoughts?  I can do the test
> > > if you have some debug patch to try.
> > Is it possible that the "second" kernel runs on non-zero CPU? If yes,
> > what if some irqs are only delivered to cpu0? (use cpumask_of(0)
> > directly)
> 
> Thanks for the reply.
> 
> For kdump, yes, for kexec, I'm not sure.  
> 
> Here is some kexec kernel boot log:
> http://people.redhat.com/~ruyang/misc/kexec-regression.txt
> 
> Copy the lockup call trace here:
> [   23.779285] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0             
> [   23.779285] Modules linked in: arc4 rtsx_pci_sdmmc i915 iwlmvm kvm_intel mac8
> 0211 kvm irqbypass btusb btrtl btbcm intel_gtt btintel drm_kms_helper snd_hda_in
> tel syscopyarea bluetooth iwlwifi snd_hda_codec snd_hwdep snd_hda_core sysfillre
> ct snd_seq sysimgblt input_leds fb_sys_fops e1000e ecdh_generic cfg80211 snd_seq
> _device drm snd_pcm serio_raw ptp pcspkr thinkpad_acpi i2c_i801 snd_timer rtsx_p
> ci pps_core snd soundcore rfkill video                                          
> [   23.779307] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc3+ #378       
> [   23.779308] Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET92WW (2.42 
> ) 03/03/2017                                                                    
> [   23.779312] RIP: 0010:poll_idle+0x2f/0x5f                                    
> [   23.779313] RSP: 0018:ffffffff81c03e80 EFLAGS: 00000246                      
> [   23.779314] RAX: ffffffff81c0f4c0 RBX: ffffffff81c6db80 RCX: 0000000000000000
> [   23.779315] RDX: 0000000000000000 RSI: ffffffff81c6db80 RDI: ffff88021f2201e8
> [   23.779316] RBP: ffff88021f2201e8 R08: 000000349a65b7dd R09: ffff88021f216db4
> [   23.779317] R10: ffffffff81c03e68 R11: 0000000000000000 R12: 0000000000000000
> [   23.779318] R13: ffffffff81c6db98 R14: 0000000000000000 R15: 0000000578a065b1
> [   23.779319] FS:  0000000000000000(0000) GS:ffff88021f200000(0000) knlGS:00000
> 00000000000                                                                     
> [   23.779320] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                
> [   23.779321] CR2: 00007ffed1d0ee60 CR3: 000000021ec0a006 CR4: 00000000001606b0
> [   23.779322] Call Trace:                                                      
> [   23.779328]  cpuidle_enter_state+0x6a/0x2c0                                  
> [   23.779333]  do_idle+0x17b/0x1d0                                             
> [   23.779335]  cpu_startup_entry+0x6f/0x80                                     
> [   23.779338]  start_kernel+0x431/0x451                                        
> [   23.779342]  secondary_startup_64+0xa5/0xb0                                  
> [   23.779344] Code: 00 fb 66 0f 1f 44 00 00 65 48 8b 04 25 40 c4 00 00 f0 80 48
>  02 20 48 8b 08 83 e1 08 74 0d eb 12 f3 90 65 48 8b 04 25 40 c4 00 00 <48> 8b 00
>  a8 08 74 ee 65 48 8b 04 25 40 c4 00 00 f0 80 60 02 df
> 

Followup this issue, seems another commit from Thomas partially fixed
this, kexec/kdump boot up successfully for me, but kexec after kexec
(2nd kexec reboot cycle) failed, kernel hung early

commit bc976233a872c0f20f018fb1e89264a541584e25
Author: Thomas Gleixner <tglx at linutronix.de>
Date:   Fri Dec 29 10:47:22 2017 +0100

    genirq/msi, x86/vector: Prevent reservation mode for non maskable MSI

Thanks
Dave



More information about the kexec mailing list