[PATCH printk v5 1/1] printk: extend console_lock for per-console locking
Marek Szyprowski
m.szyprowski at samsung.com
Mon May 2 16:13:19 PDT 2022
Hi Petr,
On 02.05.2022 15:17, Petr Mladek wrote:
> On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
>> On 30.04.2022 18:00, John Ogness wrote:
>>> On 2022-04-29, Marek Szyprowski <m.szyprowski at samsung.com> wrote:
>>>> The same issue happens if I boot with init=/bin/bash
>>> Very interesting. Since you are seeing all the output up until you try
>>> doing something, I guess receiving UART data is triggering the issue.
>> Right, this is how it looks like.
>>
>>>> I found something really interesting. When lockup happens, I'm still
>>>> able to log via ssh and trigger any magic sysrq action via
>>>> /proc/sysrq-trigger.
>>> If you boot the system and directly login via ssh (without sending any
>>> data via serial), can you trigger printk output on the UART? For
>>> example, with:
>>>
>>> echo hello > /dev/kmsg
>>>
>>> (You might need to increase your loglevel to see it.)
>> Data written to /dev/kmsg and all kernel logs were always displayed
>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>> properly on the console. The latter doesn't however trigger the input
>> related activity.
>>
>> It looks that the data read from the uart is delivered only if other
>> activity happens on the kernel console. If I type 'reboot' and press
>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>> ssh then, I only see the date printed on the console. However if I type
>> 'date >/dev/kmsg', the the date is printed and reboot happens.
> This is really interesting.
>
> 'date >/dev/kmsg' should be handled like a normal printk().
> It should get pushed to the console using printk kthread,
> that calls call_console_driver() that calls con->write()
> callback. In our case, it should be meson_serial_console_write().
>
> I am not sure if meson_serial_console_write() is used also
> when writing via /dev/ttyAML0.
>
>>>> It turned out that the UART console is somehow blocked, but it
>>>> receives and buffers all the input. For example after issuing "echo
>>>> >/proc/sysrq-trigger" from the ssh console, the UART console has been
>>>> updated and I see the magic sysrq banner and then all the commands I
>>>> blindly typed in the UART console! However this doesn't unblock the
>>>> console.
>>> sysrq falls back to direct printing. This would imply that the kthread
>>> printer is somehow unable to print.
>>>
>>>> Here is the output of 't' magic sys request:
>>>>
>>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
>>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
>>> corrupt.
>> Right, good catch. For comparison, here is a 't' sysrq result from the
>> 'working' serial console (next-20220429), which happens usually 1 of 4
>> boots:
>>
>> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
> Strange. The backtrace is weird here too:
>
> [ 50.514509] task:pr/ttyAML0 state:R running task stack: 0 pid: 65 ppid: 2 flags:0x00000008
> [ 50.514540] Call trace:
> [ 50.514548] __switch_to+0xe8/0x160
> [ 50.514561] meson_serial_console+0x78/0x118
>
> There should be kthread() and printk_kthread_func() on the stack.
>
> Hmm, meson_serial_console+0x78/0x118 is weird on its own.
> meson_serial_console is the name of the structure. I would
> expect a name of the .write callback, like
> meson_serial_console_write()
Well, this sounds a bit like a stack corruption. For the reference, I've
checked what magic sysrq 't' reports for my other test boards:
RaspberryPi4:
[ 166.702431] task:pr/ttyS0 state:R stack: 0 pid: 64
ppid: 2 flags:0x00000008
[ 166.711069] Call trace:
[ 166.713647] __switch_to+0xe8/0x160
[ 166.717216] __schedule+0x2f4/0x9f0
[ 166.720862] log_wait+0x0/0x50
[ 166.724081] task:vfio-irqfd-clea state:I stack: 0 pid: 65
ppid: 2 flags:0x00000008
[ 166.732698] Call trace:
ARM Juno R1:
[ 74.356562] task:pr/ttyAMA0 state:R running task stack: 0
pid: 47 ppid: 2 flags:0x00000008
[ 74.356605] Call trace:
[ 74.356617] __switch_to+0xe8/0x160
[ 74.356637] amba_console+0x78/0x118
[ 74.356657] task:kworker/2:1 state:I stack: 0 pid: 48
ppid: 2 flags:0x00000008
[ 74.356695] Workqueue: 0x0 (mm_percpu_wq)
[ 74.356738] Call trace:
QEMU virt/arm64:
[ 174.155760] task:pr/ttyAMA0 state:S stack: 0 pid: 26
ppid: 2 flags:0x00000008
[ 174.156305] Call trace:
[ 174.156529] __switch_to+0xe8/0x160
[ 174.157131] 0xffff5ebbbfdd62d8
In the last case it doesn't happen always. In the other runs I got the
following log from QEMU virt/arm64:
[ 200.537579] task:pr/ttyAMA0 state:S stack: 0 pid: 26
ppid: 2 flags:0x00000008
[ 200.538121] Call trace:
[ 200.538341] __switch_to+0xe8/0x160
[ 200.538583] __schedule+0x2f4/0x9f0
[ 200.538822] schedule+0x54/0xd0
[ 200.539047] printk_kthread_func+0x2d8/0x3bc
[ 200.539301] kthread+0x118/0x11c
[ 200.539523] ret_from_fork+0x10/0x20
I hope that at least the qemu case will let you to analyze it by
yourself. I run my test system with the following command:
qemu-system-aarch64 -kernel virt/Image -append "console=ttyAMA0
root=/dev/vda rootwait" -M virt -cpu cortex-a57 -smp 2 -m 1024 -device
virtio-blk-device,drive=virtio-blk0 -drive
file=qemu-virt-rootfs.raw,id=virtio-blk0,if=none,format=raw -netdev
user,id=user -device virtio-net-device,netdev=user -display none
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
More information about the linux-amlogic
mailing list