[PATCH v2 00/11] arm64: debug: remove hook registration, split exception entry

Ada Couprie Diaz ada.coupriediaz at arm.com
Wed May 28 03:38:06 PDT 2025


On 16/05/2025 12:57, Luis Claudio R. Goncalves wrote:

> On Tue, May 13, 2025 at 04:19:26PM +0100, Ada Couprie Diaz wrote:
>> Re-sending with proper text format, apologies for the noise...
>>
>> On 13/05/2025 13:25, Luis Claudio R. Goncalves wrote:
>>
>>> On Mon, May 12, 2025 at 06:43:15PM +0100, Ada Couprie Diaz wrote:
>>>> [...]
>>>>
>>>> Single Step Exception
>>>> ===
>>>>
>> Hi Luis,
>>
>> Thanks for taking the time to test, I'm glad it seems OK for now.
>>> Is there any specific test you would like me to run on that test setup I
>>> have?
>> There are a couple of edge-cases that might be problematic if my conclusions
>> are wrong : 1. Race between a step exception being taken, and the related
>> hardware breakpoint/watchpoint being removed 2. Migration of a task stepping
>> a CPU-bound breakpoint/watchpoint
>> [...]
> I ran the two tests you listed above, along with some variations just to
> make sure I got the details right, and all those tests completed flawlessly
> on both machines, on the 4 kernel configurations I tests (all with
> PREEMPT_RT enabled, with and without LOCKDEP and assorted debug features).
Thanks a lot for taking the time to test so exhaustively ! I'm happy to 
hear that this part is holding up : I am confident it should be OK.
>>>> Testing examples
>>>> ===
>>>> [...]
>>>>
>>>> GDB commands (for EL0):
>>>> ~~~
>>>> [...]
> This is the only test where I (consistently) hit backtraces. If I run the
> test with "gdb -x ${COMMAND_LIST_FILE} ..." I get a single backtrace, every
> time:
>
> [  263.890424] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> [  263.890444] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 5744, name: gdb_prog1
> [  263.890445] preempt_count: 1, expected: 0
> [  263.890446] RCU nest depth: 0, expected: 0
> [  263.890447] 1 lock held by gdb_prog1/5744:
> [  263.890448]  #0: ffff100028496f58 (&sighand->siglock){+.+.}-{3:3}, at: force_sig_info_to_task+0x30/0x150
> [  263.890468] Preemption disabled at:
> [  263.890469] [<ffff8000800391a8>] debug_exception_enter+0x18/0x78
> [  263.890484] CPU: 114 UID: 0 PID: 5744 Comm: gdb_prog1 Tainted: G        W           6.15.0-rc6-rt1__dbg #2 PREEMPT_{RT,(lazy)}
> [  263.890487] Tainted: [W]=WARN
> [  263.890488] Hardware name: Supermicro ARS-221GL-NR-01/G1SMH, BIOS 2.0 07/12/2024
> [  263.890490] Call trace:
> [  263.890492]  show_stack+0x30/0x88 (C)
> [  263.890495]  dump_stack_lvl+0xa0/0xe0
> [  263.890498]  dump_stack+0x14/0x2c
> [  263.890499]  __might_resched+0x170/0x240
> [  263.890506]  rt_spin_lock+0x6c/0x1a0
> [  263.890512]  force_sig_info_to_task+0x30/0x150
> [  263.890513]  force_sig_fault+0x68/0xa0
> [  263.890515]  arm64_force_sig_fault+0x44/0x80
> [  263.890518]  send_user_sigtrap+0x60/0xa8
> [  263.890520]  do_brk64+0x40/0x88
> [  263.890522]  el0_brk64+0x50/0x1c0
> [  263.890526]  el0t_64_sync_handler+0x60/0xe0
> [  263.890528]  el0t_64_sync+0x184/0x188
>
> Quite similar to the problem originally reported, where sending signals
> with preemption disabled could trigger the "rtlock_might_resched();" check
> if CONFIG_DEBUG_ATOMIC_SLEEP is enabled.

Oh, indeed : I can confirm that this happens both with my series and on 
mainline tags v6.15-rc6, v6.15.

I didn't see it originally, but as you point out it shows up 
consistently with CONFIG_DEBUG_ATOMIC_SLEEP enabled.

> If I call gdb and run manually the sequence of commands you described, I
> get the backtrace above three times. The only difference is that on the
> second backtrace I get these extra elements on the header:
>
>    [48052.129422] RCU nest depth: 1, expected: 1
>    [48052.129424] 2 locks held by gdb_prog1/27451:
>    [48052.129425]  #0: ffff8000828315c8 (rcu_read_lock){....}-{1:3}, at: breakpoint_handler+0xd8/0x318
>    [48052.129439]  #1: ffff00008abd92d8 (&sighand->siglock){+.+.}-{3:3}, at: force_sig_info_to_task+0x30/0x150
>
> So, when I enter manually the GDB command you suggested, the result is:
>
>      start           <--- Backtrace#1:   preempt_count: 1
>      hbreak 3
>      watch target
>      commands 2
>      continue
>      end
>      commands 3
>      continue
>      end
>      continue        <--- Backtrace#2:   preempt_count: 1   RCU nest depth: 1
>      jump 11         <--- Backtrace#3:   preempt_count: 1
>      continue
>      quit
>
> I hope this report is helpful.

Very much so, thanks !

I am looking into fixing this in v3, I feel this series is a good 
opportunity to do it.

> IMHO, even with these backtraces, there was a considerable enhancement when
> compared to the original scenario we reported.
>
> Best regards,
> Luis

I'm glad that the fix works well under more heavy testing.

Best regards,
Ada




More information about the linux-arm-kernel mailing list