Problem with nbcon console and amba-pl011 serial port
Toshiyuki Sato (Fujitsu)
fj6611ie at fujitsu.com
Wed Jun 4 23:22:32 PDT 2025
Hi Michael,
> From: Michael Kelley <mhklinux at outlook.com>
> Sent: Thursday, June 5, 2025 11:49 AM
> > Hi Michael, John,
> >
>
> [snip]
>
> >
> > This is a proposed fix to force termination by returning false from
> > nbcon_reacquire_nobuf when a panic occurs within pl011_console_write_thread.
> > (I believe this is similar to what John suggested in his previous
> > reply.)
> >
> > While I couldn't reproduce the issue using sysrq-trigger in my
> > environment (It seemed that the panic was being executed before the
> > thread processing), I did observe nbcon_reacquire_nobuf failing to
> > complete when injecting an NMI (SError) during pl011_console_write_thread.
> > Applying this fix seems to have resolved the "SMP: failed to stop
> > secondary CPUs" issue.
> >
> > This patch is for test.
> > Modifications to imx and other drivers, as well as adding
> > __must_check, will likely be required.
> >
> > Michael, could you please test this fix in your environment?
>
> I've tested the fix in my primary environment (ARM64 VM in the Azure cloud), and I've seen no failures to stop a CPU. I kept my
> custom logging in place, so I could confirm that the problem path is still happening, and the fix recovers from the problem path.
> So the good results are not due to just a timing change. The "pr/ttyAMA0" task is still looping forever trying to get ownership
> of the console, but it is doing so at a higher level in nbcon_kthread_func() and in calling nbcon_emit_one(), and interrupts are
> enabled for part of the loop.
>
> Full disclosure: I have a secondary environment, also an ARM64 VM in the Azure cloud, but running on an older version of
> Hyper-V. In this environment I see the same custom logging results, and the "pr/ttyAMA0" task is indeed looping with
> interrupts enabled. But for some reason, the CPU doesn't stop in response to IPI_CPU_STOP. I don't see any evidence that this
> failure to stop is due to the Linux pl011 driver or nbcon. This older version of Hyper-V has a known problem in pl011 UART
> emulation, and I have a theory on how that problem may be causing the failure to stop. It will take me some time to investigate
> further, but based on what I know now, that investigation should not hold up this fix.
>
> Michael
Thank you for testing the patch.
I'm concerned about the thread looping...
Regards,
Toshiyuki Sato
More information about the linux-arm-kernel
mailing list