[PATCH] panic, kexec: Don't mutex_trylock() in __crash_kexec()

Fri Jun 17 06:52:27 PDT 2022

On Fri 2022-06-17 12:52:05, Valentin Schneider wrote:
> Hi Tao,
> 
> On 17/06/22 18:42, Tao Zhou wrote:
> > Hi Valentin,
> >
> > On Thu, Jun 16, 2022 at 01:37:09PM +0100, Valentin Schneider wrote:
> >> @@ -964,24 +966,31 @@ late_initcall(kexec_core_sysctl_init);
> >>   */
> >>  void __noclone __crash_kexec(struct pt_regs *regs)
> >>  {
> >> -	/* Take the kexec_mutex here to prevent sys_kexec_load
> >> -	 * running on one cpu from replacing the crash kernel
> >> -	 * we are using after a panic on a different cpu.
> >> +	/*
> >> +	 * This should be taking kexec_mutex before doing anything with the
> >> +	 * kexec_crash_image, but this code can be run in NMI context which
> >> +	 * means we can't even trylock.
> >>  	 *
> >> -	 * If the crash kernel was not located in a fixed area
> >> -	 * of memory the xchg(&kexec_crash_image) would be
> >> -	 * sufficient.  But since I reuse the memory...
> >> +	 * Pairs with smp_mb() in do_kexec_load() and sys_kexec_file_load()
> >>  	 */
> >> -	if (mutex_trylock(&kexec_mutex)) {
> >> -		if (kexec_crash_image) {
> >> -			struct pt_regs fixed_regs;
> >> -
> >> -			crash_setup_regs(&fixed_regs, regs);
> >> -			crash_save_vmcoreinfo();
> >> -			machine_crash_shutdown(&fixed_regs);
> >> -			machine_kexec(kexec_crash_image);
> >> -		}
> >> -		mutex_unlock(&kexec_mutex);
> >> +	WRITE_ONCE(panic_wants_kexec, true);
> >> +	smp_mb();
> >> +	/*
> >> +	 * If we're panic'ing while someone else is messing with the crash
> >> +	 * kernel, this isn't going to end well.
> >> +	 */
> >> +	if (READ_ONCE(kexec_loading)) {
> >> +		WRITE_ONCE(panic_wants_kexec, false);
> >> +		return;
> >> +	}
> >
> > So this is from NMI. The mutex guarantee that kexec_file_load() or 
> > do_kexec_load() just one of them beat on cpu. NMI can happen on more
> > than one cpu. That means that here be cumulativity here IMHO.
> >
> 
> If you look at __crash_kexec() in isolation yes, but if you look at panic()
> and nmi_panic() only a single NMI can get in there. I think that is also
> true for invocations via crash_kexec().

It is true that panic() could be called only once, see this code
in panic():

	 * Only one CPU is allowed to execute the panic code from here. For
	this_cpu = raw_smp_processor_id();
	old_cpu  = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);

	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
		panic_smp_self_stop();

One the other hand, the aproach with two variables is strage
and brings exactly these questions.

If a trylock is enough that the mutex can be replaced by two
simple atomic operations. The mutex would be needed only
when a user really would need to wait for another one.

	atomic_t crash_kexec_lock;

	/* trylock part */
	if (atomic_cmpxchg_acquire(&crash_kexec_lock, 0, 1) != 0)
		return -EBUSY;

	/* do anything guarded by crash_kexec_lock */

	/* release lock */
	atomic_set_release(&crash_kexec_lock, 0);

The _acquire, _release variants will do the barriers correctly.

Best Regards,
Petr