[PATCH v2 00/11] arm64: debug: remove hook registration, split exception entry

Fri May 16 04:57:05 PDT 2025

On Tue, May 13, 2025 at 04:19:26PM +0100, Ada Couprie Diaz wrote:
> Re-sending with proper text format, apologies for the noise...
> 
> On 13/05/2025 13:25, Luis Claudio R. Goncalves wrote:
> 
> > On Mon, May 12, 2025 at 06:43:15PM +0100, Ada Couprie Diaz wrote:
> > > [...]
> > > 
> > > Single Step Exception
> > > ===
...
> Hi Luis,
> 
> Thanks for taking the time to test, I'm glad it seems OK for now.
> > Is there any specific test you would like me to run on that test setup I
> > have?
> 
> There are a couple of edge-cases that might be problematic if my conclusions
> are wrong : 1. Race between a step exception being taken, and the related
> hardware breakpoint/watchpoint being removed 2. Migration of a task stepping
> a CPU-bound breakpoint/watchpoint
> 
> I have been stress testing them on an AMD Seattle board with 4 cores, but
> more extensive testing is always welcome.
> 
> I'll describe my testing below, but it is a bit messy and might be unclear,
> my apologies.
> 
> I have been using the following very rough program (compiled with -O0) :
> 
...
> 
> Which runs continuously, repeatedly changing a fixed addressed, so that a
> hardware watchpoint can be set externally via perf and be CPU-bound :
> 
> 	perf stat -C $CPU -emem:0x6000000000/8:w
> 
> 
> So to test 1. I run perf in a loop, with --timeout 10 so that it
> adds/removes the watchpoint repeatedly, one for each CPU.
> 
> 	while true; do perf stat --timeout 10 -C $CPU -emem:0x6000000000/8:w ; done
> 
> 
> My machine has 4 hardware watchpoints, so I can cover all cores and see that
> the counts are consistent, even if the target task switches cores.
> (It is never 0 on all cores, no errors are produced, it is consistent with
> the count when perf is ran on all-cores rather than core-by-core (-a) or
> with the task PID (-p) )
> 
> To test 2. I again set one perf monitor per CPU, this time without timeout,
> and then load the system to try to force preemption (with ssdd for example),
> similarly waiting for inconsistencies, errors, or the count stopping.
> 
> However, this might be more difficult if the number of cores is much greater
> than the number of hardware watchpoints.
> For 1. the task could be pinned to a core, but for 2. the task could be
> limited to as many cores as the system has hardware watchpoints.

I ran the two tests you listed above, along with some variations just to
make sure I got the details right, and all those tests completed flawlessly
on both machines, on the 4 kernel configurations I tests (all with
PREEMPT_RT enabled, with and without LOCKDEP and assorted debug features).

> Hopefully that makes sense, but I understand it's a bit involved.
> 
...
> > > Testing examples
> > > ===
> > > 
> > > Perf (for EL1):
> > > ~~~
> > > Assuming that `perf` is on your $PATH and building with `kallsyms`
> > > 
> > >    #!/bin/bash
> > >    watch_addr=$(sudo cat /proc/kallsyms | grep "D jiffies$" | cut -f1 -d\  )
> > >    break_addr=$(sudo cat /proc/kallsyms | grep "clock_nanosleep$" | cut -f1 -d\  )
> > >    cmd="sleep 0.01"
> > >    sudo perf stat -a -e mem:0x${watch_addr}/8:w -e mem:0x${break_addr}:x ${cmd}
> > > 
> > > NB: This does /not/ test EL1 BRKs.
> > > 
> > > 
> > > GDB commands (for EL0):
> > > ~~~
> > > The following C example, compiled with `-g -O0`
> > > 
> > >    int main() {
> > >            int add = 0xAA;
> > >            int target = 0;
> > > 
> > >            target += add;
> > > 
> > >    #ifdef COMPAT
> > >        __asm__("BKPT");
> > >    #else
> > >        __asm__("BRK 1");
> > >    #endif
> > >            return target;
> > >    }
> > > 
> > > Combined with the following GDB command-list
> > > 
> > >    start
> > >    hbreak 3
> > >    watch target
> > >    commands 2
> > >    continue
> > >    end
> > >    commands 3
> > >    continue
> > >    end
> > >    continue
> > >    jump 11
> > >    continue
> > >    quit
> > > 
> > > Executed as such : `gdb -x ${COMMAND_LIST_FILE} ./a.out`
> > > should go through the whole program, return 0252/170/0xAA, and
> > > exercise all EL0 debug exception entries.

This is the only test where I (consistently) hit backtraces. If I run the
test with "gdb -x ${COMMAND_LIST_FILE} ..." I get a single backtrace, every
time:

[  263.890424] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[  263.890444] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 5744, name: gdb_prog1
[  263.890445] preempt_count: 1, expected: 0
[  263.890446] RCU nest depth: 0, expected: 0
[  263.890447] 1 lock held by gdb_prog1/5744:
[  263.890448]  #0: ffff100028496f58 (&sighand->siglock){+.+.}-{3:3}, at: force_sig_info_to_task+0x30/0x150
[  263.890468] Preemption disabled at:
[  263.890469] [<ffff8000800391a8>] debug_exception_enter+0x18/0x78
[  263.890484] CPU: 114 UID: 0 PID: 5744 Comm: gdb_prog1 Tainted: G        W           6.15.0-rc6-rt1__dbg #2 PREEMPT_{RT,(lazy)}
[  263.890487] Tainted: [W]=WARN
[  263.890488] Hardware name: Supermicro ARS-221GL-NR-01/G1SMH, BIOS 2.0 07/12/2024
[  263.890490] Call trace:
[  263.890492]  show_stack+0x30/0x88 (C)
[  263.890495]  dump_stack_lvl+0xa0/0xe0
[  263.890498]  dump_stack+0x14/0x2c
[  263.890499]  __might_resched+0x170/0x240
[  263.890506]  rt_spin_lock+0x6c/0x1a0
[  263.890512]  force_sig_info_to_task+0x30/0x150
[  263.890513]  force_sig_fault+0x68/0xa0
[  263.890515]  arm64_force_sig_fault+0x44/0x80
[  263.890518]  send_user_sigtrap+0x60/0xa8
[  263.890520]  do_brk64+0x40/0x88
[  263.890522]  el0_brk64+0x50/0x1c0
[  263.890526]  el0t_64_sync_handler+0x60/0xe0
[  263.890528]  el0t_64_sync+0x184/0x188

Quite similar to the problem originally reported, where sending signals
with preemption disabled could trigger the "rtlock_might_resched();" check
if CONFIG_DEBUG_ATOMIC_SLEEP is enabled.

If I call gdb and run manually the sequence of commands you described, I
get the backtrace above three times. The only difference is that on the
second backtrace I get these extra elements on the header:

  [48052.129422] RCU nest depth: 1, expected: 1
  [48052.129424] 2 locks held by gdb_prog1/27451:
  [48052.129425]  #0: ffff8000828315c8 (rcu_read_lock){....}-{1:3}, at: breakpoint_handler+0xd8/0x318
  [48052.129439]  #1: ffff00008abd92d8 (&sighand->siglock){+.+.}-{3:3}, at: force_sig_info_to_task+0x30/0x150

So, when I enter manually the GDB command you suggested, the result is:

    start           <--- Backtrace#1:   preempt_count: 1
    hbreak 3
    watch target
    commands 2
    continue
    end
    commands 3
    continue
    end
    continue        <--- Backtrace#2:   preempt_count: 1   RCU nest depth: 1
    jump 11         <--- Backtrace#3:   preempt_count: 1
    continue
    quit

I hope this report is helpful.

IMHO, even with these backtraces, there was a considerable enhancement when
compared to the original scenario we reported.

Best regards,
Luis

> > > By using a cross-compiler and passing and additional `-DCOMPAT` argument
> > > during compilation, the `BKPT32` path can also be tested.
> > > NOTE: `BKPT` *will* make GDB loop infinitely, that is expected. Sending
> > > SIGINT to GDB will break the loop and the execution should complete.
> > > 
> > > [...]
> 
---end quoted text---