[PATCH 1/7] ia64, kdump: Mask MCA/INIT on freezing cpus

Hidetoshi Seto seto.hidetoshi at jp.fujitsu.com
Wed Jun 24 22:15:59 EDT 2009


Robin Holt wrote:
> The concern is that any time we prevent SAL from receiving control during
> an MCA/INIT, we reduce the maintainability of the machine.  Having them
> masked at any time results in the NMI/INIT not recording the PROM record
> which we use to diagnose where the hang is.

Think about servers which have no such PROM record features... Please?

The original problem here, which I wrote these patches for, is that the
INIT can block retrieving crashdump via kdump.  The crashdump is the only
record which we can use to diagnose where the hang is, if the PROM record
like SGI servers have is not supported.
(I guess the even the PROM record is supported, the crashdump is better,
 more important resource for the trouble shooting.)

My patches will mask MCA/INIT on all CPUs once kdump is invoked (via
panic or INIT), and soon unmask one of them who is going to jump in 2nd
kernel (=kdump kernel) after registering a do-nothing handler.

If there was a pending INIT, it will be received on the cpu as soon as
it is unmasked.  Then the PROM will make a record on it, pass the control
to OS_INIT which does nothing, and return to interrupted context to
continue processing the kdump.

What time point are you concerning?


> In other patches, you implemented a do-nothing handler.  Could that
> be used?

... How?  Maybe I could not catch your point.

It would be useful, but it is only available from the beginning of 2nd
kernel (to be exact, from the end of 1st kernel), until new INIT handlers
for 2nd kernel is registered.


> Alternatively, when the machine is first booted, the handler is defined
> by SAL as a SAL routine.  Could you record that during kernel boot and
> then just set the handler back to the SAL provided one prior to starting
> the kexec kernel boot?  At that point, the machine is more like the
> first boot.  Now that I think about this, this alternative seems fairly
> attractive.

I think it is definitely wrong thing if SAL provides the initial handler
as OS_INIT which can be removed/replaced by OS.

Since INIT event processes PAL_INIT -> SAL_INIT -> OS_INIT(if available),
SAL should keep the entry point of its initial handler and should use it
from SAL_INIT when OS_INIT is not registered.  Ditto to OS_MCA.


Thanks,
H.Seto




More information about the kexec mailing list