[V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

Thu Dec 3 01:35:44 PST 2015

On Thu, Dec 03, 2015 at 02:01:38AM +0000, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > On Wed, Dec 02, 2015 at 11:57:38AM +0000, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > > We can do so, but I think resetting panic_cpu always would be
> > > simpler and safer.
> 
> I'll state in detail.
> 
> When we call crash_kexec() without entering panic() and return from
> it, panic() should be called eventually.

Huh, the call chain is

panic->crash_kexec

Or do you mean, when crash_kexec() is not called by panic() but by some
of its other callers?

> But the code paths are a bit complicated and there are many
> implementations for each architecture. So one day, this assumption may
> be broken; the CPU doesn't call panic(). Or the CPU may fail to call
> panic() because we are already in insane state. It would be nervous,
> but allowing another CPU to process panic routines by resetting
> panic_cpu is safer approach.

My suggestion was to do this only on the panic path - not necessarily on
the others.

> Since this code is executed only once due to panic_cpu,
> I think introducing this logic is not much valuable.
> Also, current implementation is already quite simple:
> 
> panic()
> {
> ...
> 	__crash_kexec(NULL) {
> 		if (mutex_trylock(&kexec_mutex)) {
> 			if (kexec_crash_image) {
> 				/* don't return */
> 			}

I don't mean the kexec_crash_image case - I mean the opposite one:
!kexec_crash_image. And I think I know now what you're trying to tell
me: the first CPU which hits panic, will finish panic eventually and so
it will take down the machine.

Every other CPU which happens to enter panic in between the first CPU
and the machine being taken down, doesn't matter because, well, who
cares, we're panicking already.

Am I close?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.