[V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

Thu Dec 3 03:29:21 PST 2015

> On Thu, Dec 03, 2015 at 02:01:38AM +0000, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > > On Wed, Dec 02, 2015 at 11:57:38AM +0000, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > > > We can do so, but I think resetting panic_cpu always would be
> > > > simpler and safer.
> >
> > I'll state in detail.
> >
> > When we call crash_kexec() without entering panic() and return from
> > it, panic() should be called eventually.
> 
> Huh, the call chain is
> 
> panic->crash_kexec
> 
> Or do you mean, when crash_kexec() is not called by panic() but by some
> of its other callers?

I was arguing about the case of oops_end --> crash_kexec
--> return from crash_kexec because of !kexec_crash_image -->
panic.

In the case of panic --> __crash_kexec, __crash_kexec is called
only once, so we don't need to check the return value of __crash_kexec
as you suggested.  So I thought you stated about crash_kexec --> panic
case.

> > But the code paths are a bit complicated and there are many
> > implementations for each architecture. So one day, this assumption may
> > be broken; the CPU doesn't call panic(). Or the CPU may fail to call
> > panic() because we are already in insane state. It would be nervous,
> > but allowing another CPU to process panic routines by resetting
> > panic_cpu is safer approach.
> 
> My suggestion was to do this only on the panic path - not necessarily on
> the others.
> 
> > Since this code is executed only once due to panic_cpu,
> > I think introducing this logic is not much valuable.
> > Also, current implementation is already quite simple:
> >
> > panic()
> > {
> > ...
> > 	__crash_kexec(NULL) {
> > 		if (mutex_trylock(&kexec_mutex)) {
> > 			if (kexec_crash_image) {
> > 				/* don't return */
> > 			}
> 
> I don't mean the kexec_crash_image case - I mean the opposite one:
> !kexec_crash_image.

I also mentioned !kexec_crash_image case...

> And I think I know now what you're trying to tell
> me: the first CPU which hits panic, will finish panic eventually and so
> it will take down the machine.

No.  The first CPU calls panic, and then it calls __crash_kexec.
Because of !kexec_crash_image, it returns from __crash_kexec and
continues to the panic procedure.  At the same time, another CPU
tries to call panic(), but it doesn't run the panic procedure;
panic_cpu prevents the second CPU from running it.
This means __crash_kexec is called only once even if we don't
check the return value of __crash_kexec.
(Please note that crash_kexec can be called multiple times in the
case of oops_end() --> crash_kexec().)

I'm sorry I couldn't tell my thought well.

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group