[arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction

Luis Machado luis.machado at linaro.org
Mon Jan 13 10:13:19 PST 2020


Ping?

On 11/26/19 1:35 PM, Luis Machado wrote:
> ping?
> 
> On 11/18/19 11:54 AM, Luis Machado wrote:
>> Hi Will,
>>
>> Thanks for the thorough explanation.
>>
>> On 11/18/19 10:15 AM, Will Deacon wrote:
>>> Hi Luis,
>>>
>>> [+Mark for the valid_user_regs() part]
>>>
>>> On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote:
>>>> I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP
>>>> request by GDB won't execute the underlying instruction. As a 
>>>> consequence,
>>>> the PC doesn't move, but we return a SIGTRAP just like we would for a
>>>> regular successful PTRACE_SINGLESTEP request.
>>>>
>>>> Since there are no software breakpoints inserted at PC (we are actually
>>>> stepping over a breakpoint, so GDB removes the breakpoint at PC before
>>>> issuing a PTRACE_SINGLESTEP request), this is an odd behavior.
>>>>
>>>> Though not too harmful, i see this manifesting in the GDB testsuite
>>>> (gdb.reverse/insn-reverse.exp), which throws the test off by making GDB
>>>> think it is further in the instruction stream than it really is. In 
>>>> fact, we
>>>> get lucky here and no FAIL's show up, only many more spurious PASSes.
>>>
>>> I managed to reproduce this locally and I think I've figured out what's
>>> going on, although I'm not sure that the kernel is the best place to fix
>>> it.
>>>
>>> Looking at the specific reproducer:
>>>
>>>> Execute gdb like so:
>>>>
>>>> gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex 
>>>> "record" -ex
>>>> "si" -ex "rsi" -ex "record stop" insn-reverse
>>>
>>> So we've got a couple of instructions as follows (it doesn't actually 
>>> matter
>>> what they are, so I've changed the LD1 in your binary for a NOP in 
>>> order to
>>> avoid confusion with the "load" label not actually pointing at a load):
>>>
>>>     0x7b8:        mov    // "load"
>>>     0x7bc:        nop
>>>
>>> "b load" places a breakpoint at 0x7b8:
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0
>>>
>>> We run to a software breakpoint on "load" (the mov instruction). We take
>>> the trap and try to execute the "si", which means we need to remove the
>>> breakpoint while we step over it:
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
>>>     [...]
>>>     ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0)  = 0
>>>
>>> This causes the kernel to arm the single-step state machine so that
>>> MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). 
>>> Running
>>> an instruction in userspace will transition to MDSCR_EL1.SS ==1 and
>>> SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the 
>>> trap to
>>> trigger, at which point gdb puts the breakpoint instruction back 
>>> since the
>>> step is complete:
>>
>> So, just to confirm my understanding, we have a couple bits 
>> controlling single-stepping in the kernel, one in MDSCR_EL1 and 
>> another in SPSR_EL1. GDB doesn't have direct access to any of those, 
>> correct?
>>
>> Instead, GDB has access to a SS bit in the reserved 21~22 range of CPSR.
>>
>> The transition from active-not-pending to active-pending takes place 
>> via a single PTRACE_SINGLESTEP request? Is that correct?
>>
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0
>>>
>>> This is where things start to go wrong. The "rsi" command attempts to
>>> perform a reverse step, which means restoring the old state when we were
>>> previously executing at 0x7b8. It starts by removing the breakpoint 
>>> again,
>>> since we've already hit that:
>>>
>>>     ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
>>>
>>> and then resets the CPU registers to their old values:
>>>
>>>     (I don't know why it does this three times)
>>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>>     ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
>>> [{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
>>>
>>> The problem with this is that we have moved the PC back to 0x7b8 but 
>>> we have
>>> also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen 
>>> stepping
>>> get disabled (this usually happens by PTRACE_CONT calling
>>> user_disable_single_step()) which means that MDSCR_EL1.SS remains set 
>>> to 1
>>> and we're in the active-pending state! Consequently, we immediately 
>>> take a
>>> step exception if a step operation is attempted >
>>
>> While trying to reproduce this, i was paying attention to the SS bit 
>> coming and going. But in the particular sequence of si/rsi, within the 
>> record boundaries, i see GDB just restored the original CPSR value to 
>> what it was before we processed the si command.
>>
>>  From GDB's POV all state was restore to the way it was before and 
>> we're good to go.
>>
>> Is this not enough to restore state kernel-wise?
>>
>>> Now, we *could* consider hacking the TIF_SINGLESTEP check in
>>> valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is 
>>> active
>>> but this is a user-visible change and may break things like stepping 
>>> out of
>>> signal handlers. I would prefer that GDB manages the SS bit 
>>> explicitly in
>>> this scenario, by setting it to 1 when restoring the old state in the
>>> reverse step, a bit like when it disables the old breakpoint. You can
>>> emulate this by doing:
>>
>> I think we could let GDB control this when required, but I'm trying to 
>> understand the ramifications of letting GDB do so.
>>
>> For example, what if the user decides to alter the PC here and there, 
>> for debugging purposes. That is a use case that happens often, in 
>> order to go back or skip some parts of the code.
>>
>> Would we need to pay attention to the SS bit in those cases as well?
>>
>>>
>>>     (gdb) set $cpsr |= (1<<21)
>>
>> In particular, what does the switching of this bit accomplishes in the 
>> kernel? Would we be better off forcing the SS bit every time we do a 
>> single-step operation, for example?



More information about the linux-arm-kernel mailing list