Problem with nbcon console and amba-pl011 serial port
John Ogness
john.ogness at linutronix.de
Fri Jun 6 09:58:22 PDT 2025
On 2025-06-06, Petr Mladek <pmladek at suse.com> wrote:
>> What if during non-panic-CPU shutdown, we allow reacquires to succeed
>> only for _direct_ acquires? The below diff shows how this could be
>> implemented. Since it only supports direct acquires, it does not violate
>> any state rules. And also, since it only involves the reacquire, there
>> is no ugly battling for console printing between the panic and non-panic
>> CPUs.
>
> Interesting idea. I thought a lot about it, see below.
>
>
>> diff --git a/include/linux/printk.h b/include/linux/printk.h
>> index 5b462029d03c1..d58ebdc8170b3 100644
>> --- a/include/linux/printk.h
>> +++ b/include/linux/printk.h
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index b0b9a8bf4560d..8f572630c9f7e 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -304,6 +310,8 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>> smp_send_stop();
>> else
>> crash_smp_send_stop();
>> +
>> + nbcon_panic_allow_reacquire_set(false);
>> }
>
> I have two concerns here:
>
> 1. I wonder whether this is reliable enough. It seems that
> smp_send_stop() waits at least 1 sec until the CPUs
> get stopped. But is this enough on virtualized systems?
>
> 2. It might increase a risk when CPUs are stopped using NMI.
> The change would allow a non-panic CPU to reacquire the ownership
> and enter _unsafe_ section right before being stopped by NMI.
>
>
> The 1st problem might be avoided by allowing the reacquire all
> the time unless an "unsafe" takeover happened.
>
> The 2nd problem is worse. But allowing the reacquire all the time
> might actually help as well.
>
> Note that the information about the "unsafe_takeover" is stored
> in struct console so that we even won't need a new global
> variable.
That is a good idea. We should add unsafe_takeover as a condition as
well. That can only happen way after the smp_send_stop() anyway. But it
means that only the atomic printing code would ever need to worry about
unsafe_takeover (assuming a driver were to implement some sort of
handling of it).
>> /**
>> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>> index d60596777d278..d960cb8a05558 100644
>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -235,7 +235,8 @@ static void nbcon_seq_try_update(struct nbcon_context *ctxt, u64 new_seq)
>> * the handover acquire method.
>> */
>> static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>> - struct nbcon_state *cur)
>> + struct nbcon_state *cur,
>> + bool ignore_other_cpu_in_panic)
>> {
>> unsigned int cpu = smp_processor_id();
>> struct console *con = ctxt->console;
>> @@ -249,7 +250,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>> * nbcon_waiter_matches(). In particular, the assumption that
>> * lower priorities are ignored during panic.
>> */
>> - if (other_cpu_in_panic())
>> + if (other_cpu_in_panic() && !ignore_other_cpu_in_panic)
>
> If you agree with allowing the reacquire all the time then I would
> rename the parameter to @is_reacquire and do something like:
>
> if (other_cpu_in_panic() &&
> (!is_reacquire || cur->unsafe_takeover))
Looks fine to me.
>> return -EPERM;
>>
>> if (ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio)
>> @@ -913,7 +920,7 @@ void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt)
>> {
>> struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
>>
>> - while (!nbcon_context_try_acquire(ctxt))
>> + while (!nbcon_context_try_acquire(ctxt, READ_ONCE(nbcon_panic_allow_reacquire)))
>
> And here it would be:
>
> while (!nbcon_context_try_acquire(ctxt, true))
Right.
>> cpu_relax();
>>
>> nbcon_write_context_set_buf(wctxt, NULL, 0);
>
>
> Summary:
>
> I open to give this alternative approach a chance when we allow the
> reacquire all the time. It might work well. And it won't require
> any special "panic" handling in all console drivers.
Agreed. Thanks for being open about this approach. I will put together
an official patch.
John
More information about the linux-arm-kernel
mailing list