kexec fails (pretty often)

Zou Nan hai nanhai.zou at intel.com
Tue Jul 3 20:29:39 EDT 2007


On Wed, 2007-07-04 at 04:24, Eric W. Biederman wrote:
> "Natalie Protasevich" <protasnb at gmail.com> writes:
> 
> > I came across a report about panics on a IA64 system that happen when
> > kexec is being executed. The FSB parity error gets generated:
> >
> > BRLD / UC to x8208208208,   A43:41 = x0,  FSB Parity Error detected on
> > Processor Request
> > BRLC / UC to xFFFF2000000,  A43:41 = x7,  FSB Parity Error detected on
> > the Deferred Reply
> > BRLD / WB to xFFFFFFF0028,  A43:41 = x7,  FSB Parity Error detected on
> > the Deferred Reply
> > BRLD / WB to xFFFFFFF0028,  A43:41 = x7,  FSB Parity Error detected on
> > the Deferred Reply
> > BRLC / UC to xFFFF2000000,  A43:41 = x7,  FSB Parity Error detected on
> > the Deferred Reply
> > BRLD / UC to x8208208208,   A43:41 = x0,  FSB Parity Error detected on
> > Processor Request
> >
> >
> > And the pattern of the address on the bus is actually coming from the
> > piece of code in arch/ia64/kernel/gate.S, calculating ar.bpstore:
> >
> > ...
> >        sub r14=r14,r17         // r14 <- -rse_num_regs(bspstore1, bsp1)
> >        movl r17=0x8208208208208209
> >        ;;
> >        add r18=r18,r14         // r18 (delta) <- rse_slot_num(bsp0) -
> > rse_num_regs(bspstore1,bsp1)
> >        setf.sig f7=r17
> >        cmp.lt p7,p0=r14,r0     // p7 <- (r14 < 0)?
> >        ;;
> > ...
> >


Hi,

Is the problem reproducible? Is there any special configuration or kexec
command line option to reproduce it? 
On which platform and which version of kernel did you see the issue?

It looks like there may be something wrong with the memory map setting
of the second kernel.
Can you send me copies of /proc/iomem of the first kernel and the second
kernel?

Thanks
Zou Nan hai


> > Have you seen such error before? What would you recommend for debugging this?
> 
> Not really.
> 
> However this sounds fairly deterministic on the hardware involved.
> So I would recommend a code audit.
> 
> With low-level kexec code like this it really requires someone who knows
> the architecture to think through the code.
> 
> Adding in serial output into the assembly and what not can help to
> isolate the piece of the code causing the problem.  But it looks
> like you have done that.
> 
> You haven't provided quite enough context for me to understand how
> this code sequence is reproduced.  I would certainly need more
> information then you have given to even locate the code path this is
> coming from, as it has been a long time since I looked at ia64.
> 
> I have CC'd a few likely suspects and the kexec list so with a little
> luck if anyone is familiar with this they can answer you.
> 
> Eric



More information about the kexec mailing list