[PATCH v4] panic: Fixes the panic_print NMI backtrace setting

Wed Feb 15 07:39:22 PST 2023

On 14/02/2023 11:46, Petr Mladek wrote:
> [...]
>> My understanding is that it's a mechanism to prevent some concurrency,
>> in case some other CPU modify this variable while panic() is running.
>> I find it very unlikely, hence I removed it - but if people consider
>> this copy needed, I can respin this patch and keep it, even providing a
>> comment about that, in order to be explict about its need.
> 
> Yes, I think that it makes the behavior consistent even when the
> global variable manipulated in parallel.
> 
> I would personally prefer to keep the local copy. Better safe
> than sorry.
> 

Hi Petr, thanks for your review!
OK, we could keep this local copy, makes sense...even adding a comment,
to make its purpose really clear.

>> [...]
>> @@ -211,9 +211,6 @@ static void panic_print_sys_info(bool console_flush)
>>  		return;
>>  	}
>>  
>> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> -		trigger_all_cpu_backtrace();
>> -
> 
> Sigh, this is yet another PANIC_PRINT_ action that need special
> timing. We should handle both the same way.
> 
> What about the following? The parameter @mask says what
> actions are allowed at the given time.
> < ..code..> 

I think your approach is interesting, it's very "organized".

But I think it's a bit conflicting with that purpose we had on notifiers
refactor, to deprecate "bogus" usages of panic_print, as in
https://lore.kernel.org/lkml/20220427224924.592546-26-gpiccoli@igalia.com/ .

So, the idea of my approach is to allow:

(a) Easy removal of panic_print_sys_info() of panic(), once we move it
to a panic notifier;

(b) Better separate and identify the "bogus" cases. The CPU backtrace
one is less a bogus case in my opinion, more a "complicated" one, since
it's related with the CPUs stop routines. But the console flush, as we
discussed, it's clearly something that calls for a new parameter (and
such param was added in the refactor patch).

In the end, I think your approach is interesting but it's kinda like
we're adding the fix to later, on refactor, entirely remove/rework it.
With my approach we wouldn't be calling panic_print_sys_info() again
(3rd time!) on panic(), and also would be more natural to move it later
to a new panic notifier.

What you / others think? If your approach is in the end preferred, it's
fine by me - I'd just ask you to submit as a full patch so we can get it
merged as a fix in 6.3, if possible (and backport it to the 6.1/6.2
stable). Now, if my approach is fine, I can resubmit as a V5 keeping the
local variable - lemme know.

Cheers,

Guilherme