[V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

Wed Dec 2 03:57:38 PST 2015

Hello Borislav,

Sorry, I haven't replied to this mail yet.

> On Fri, Nov 20, 2015 at 06:36:48PM +0900, Hidehiro Kawai wrote:
...
> > +void crash_kexec(struct pt_regs *regs)
> > +{
> > +	int old_cpu, this_cpu;
> > +
> > +	/*
> > +	 * Only one CPU is allowed to execute the crash_kexec() code as with
> > +	 * panic().  Otherwise parallel calls of panic() and crash_kexec()
> > +	 * may stop each other.  To exclude them, we use panic_cpu here too.
> > +	 */
> > +	this_cpu = raw_smp_processor_id();
> > +	old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > +	if (old_cpu == -1) {
> > +		/* This is the 1st CPU which comes here, so go ahead. */
> > +		__crash_kexec(regs);
> > +
> > +		/*
> > +		 * Reset panic_cpu to allow another panic()/crash_kexec()
> > +		 * call.
> 
> So can we make __crash_kexec() return error values?
> 
> * failed to grab kexec_mutex -> reset panic_cpu
> 
> * no kexec_crash_image -> no need to reset it, all future crash_kexec()
> calls won't work so no need to run into that path anymore. However, this could
> be problematic if we want the other CPUs to panic. Do we care?
> 
> * machine_kexec successful -> doesn't matter

We can do so, but I think resetting panic_cpu always would be
simpler and safer.

Although checking kexec_crash_image each time is pointless, it
doesn't cause any actual problem.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group