[PATCH v3 0/2] arm64: Fix pending single-step debugging issues

Sumit Garg sumit.garg at linaro.org
Mon Jul 11 05:43:59 PDT 2022


Hi Doug,

On Sat, 2 Jul 2022 at 03:44, Doug Anderson <dianders at chromium.org> wrote:
>
> Hi,
>
> On Tue, May 10, 2022 at 11:05 PM Sumit Garg <sumit.garg at linaro.org> wrote:
> >
> > This patch-set reworks pending fixes from Wei's series [1] to make
> > single-step debugging via kgdb/kdb on arm64 work as expected. There was
> > a prior discussion on ML [2] regarding if we should keep the interrupts
> > enabled during single-stepping. So patch #1 follows suggestion from Will
> > [3] to not disable interrupts during single stepping but rather skip
> > single stepping within interrupt handler.
> >
> > [1] https://lore.kernel.org/all/20200509214159.19680-1-liwei391@huawei.com/
> > [2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@mail.gmail.com/
> > [3] https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/
> >
> > Changes in v3:
> > - Reword commit descriptions as per Daniel's suggestions.
> >
> > Changes in v2:
> > - Replace patch #1 to rather follow Will's suggestion.
> >
> > Sumit Garg (2):
> >   arm64: entry: Skip single stepping into interrupt handlers
> >   arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
> >
> >  arch/arm64/include/asm/debug-monitors.h |  1 +
> >  arch/arm64/kernel/debug-monitors.c      |  5 +++++
> >  arch/arm64/kernel/entry-common.c        | 18 +++++++++++++++++-
> >  arch/arm64/kernel/kgdb.c                |  2 ++
> >  4 files changed, 25 insertions(+), 1 deletion(-)
>
> Sorry it took so long for me to respond. I kept dreaming that I'd find
> the time to really dig deep into this to understand it fully and I'm
> finally giving up on it.

No worries and apologies on my part as well as I had to find some time
to reproduce the issue that you have reported below.

> I'm going to hope that Will and/or Catalin
> knows this area of the code well and can give it a good review. If not
> then I'll strive harder to make the time...
>
> In any case, I poked around with this a bunch and it definitely
> improved the stepping behavior a whole lot. I still got one case where
> gdb hit an assertion while I was stepping, but I could believe that
> was a problem with gdb? I couldn't reproduce it. Thus I can at least
> give:
>
> Tested-by: Douglas Anderson <dianders at chromium.org>
>

Thanks for the testing.

> I'll also note that I _think_ I remember that with Wei's series that
> the gdb function "call" started working. I tried that here and it
> didn't seem so happy. To keep things simple, I created a dummy
> function in my kernel that looked like:
>
> void doug_test(void)
> {
>   pr_info("testing, 1 2 3\n");
> }
>
> I broke into the debugger by echoing "g" to /proc/sysrq-trigger and
> then tried "call doug_test()". I guess my printout actually printed
> but it wasn't so happy after that. Seems like it somehow ended up
> returning to a bogus address after the call which then caused a crash.
>

I am able to reproduce this issue on my setup as well. But it doesn't
seem to be a regression caused by this patch-set over Wei's series. As
I could reproduce this issue with v1 [1] patch-set as well which was
just a forward port of pending patches from Wei's series to the latest
upstream.

Maybe it's a different regression caused by other changes? BTW, do you
remember the kernel version you tested with Wei's series applied?

[1] https://lore.kernel.org/linux-arm-kernel/20220411093819.1012583-1-sumit.garg@linaro.org/T/

-Sumit

>   testing, 1 2 3
>   BUG: sleeping function called from invalid context at
> arch/arm64/mm/fault.c:593
>   in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 3393, name: bash
>   preempt_count: 0, expected: 0
>   RCU nest depth: 1, expected: 0
>   CPU: 6 PID: 3393 Comm: bash Not tainted 5.19.0-rc4+ #3
> dbec0bdb8582e447bccdcf2e70d7fe04477b1aac
>   Hardware name: Google Herobrine (rev1+) (DT)
>   Call trace:
>    dump_backtrace+0xf0/0x110
>    show_stack+0x24/0x70
>    dump_stack_lvl+0x64/0x7c
>    dump_stack+0x18/0x38
>    __might_resched+0x144/0x154
>    __might_sleep+0x54/0x84
>    do_page_fault+0x1d4/0x42c
>    do_mem_abort+0x4c/0xb0
>    el1_abort+0x3c/0x5c
>    el1h_64_sync_handler+0x4c/0xc4
>    el1h_64_sync+0x64/0x68
>    0xffffffc008000000
>    __handle_sysrq+0x15c/0x184
>    write_sysrq_trigger+0x94/0x128
>    proc_reg_write+0xbc/0xec
>    vfs_write+0xf0/0x2c8
>    ksys_write+0x84/0xf0
>    __arm64_sys_write+0x28/0x34
>    invoke_syscall+0x4c/0x120
>    el0_svc_common+0x94/0xfc
>    do_el0_svc+0x38/0xc0
>    el0_svc+0x2c/0x7c
>    el0t_64_sync_handler+0x48/0x114
>    el0t_64_sync+0x18c/0x190
>   Unable to handle kernel execute from non-executable memory at
> virtual address ffffffc008000000
>   Mem abort info:
>     ESR = 0x000000008600000f
>     EC = 0x21: IABT (current EL), IL = 32 bits
>     SET = 0, FnV = 0
>     EA = 0, S1PTW = 0
>     FSC = 0x0f: level 3 permission fault
>   swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000082863000
>   [ffffffc008000000] pgd=100000027ffff003, p4d=100000027ffff003,
> pud=100000027ffff003, pmd=100000027fffe003, pte=00680001001c3703
>   Internal error: Oops: 8600000f [#1] PREEMPT SMP
>
> I'm not sure if that's a sign that something is missing with your patch or not.
>
> -Doug



More information about the linux-arm-kernel mailing list