Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM?

Simon Marchi simon.marchi at ericsson.com
Mon May 30 10:48:11 PDT 2016


Hello knowledgeable ARM people!

(Background: https://sourceware.org/ml/gdb/2016-05/msg00020.html )

Debugging a flaky GDB test case on ARM lead me to think there might
be race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM
(PTRACE_SETVFPREGS is ARM-specific anyway).  The test case (and the
reproducer below) changes the value of a VFP register (let's say d0)
using PTRACE_SETVFPREGS and resumes the thread with PTRACE_CONT.  It
happens intermittently that the thread resumes execution with the
old value in d0 instead of the new one.

Here is a minimal reproducing example.

test.S:

  .global _start
  _start:
      vldr.64 d0, constant
      vldr.64 d1, constant

  break_here:
      vcmp.f64 d0, d1
      vmrs APSR_nzcv, fpscr

      # Exit code
      moveq r0, #1
      movne r0, #0

      # Exit syscall
      mov r7, #1
      svc 0

  .align 8
  constant:
  .word 0xc8b43958
  .word 0x40594676

Built with:

  $ gcc -g3 -O0 -o test test.S -nostdlib

And the gdb script, test.gdb:

  file test
  b break_here
  run
  p $d0 = 4.0
  c

The test is ran with

  $ ./gdb -nx -x test.gdb -batch

The test loads the same constant in d0 and d1.  It then does a comparison between
them and exits with 1 (failure) if they are the same, 0 (success) if they are different.
The GDB script breaks at "break_here", tries to change the value of d0 to some other
constant (4.0) and lets the program continue and exit.  If our register write succeeded,
the program should exit with 0 (values are different).  If our register write failed, the
program will exit with 1 (values are still the same).

The result is that I randomly see both cases, hinting to a race between the register write
and the time where the kernel restores the thread's vfp registers.  Note that when GDB's
affinity is pinned to a single core, I do not see the failure.  Also, note that when I
remove the vldr.64 instructions, I can't seem to reproduce the problem, so it looks
like they are somehow important.

I see this behavior on 3 different boards:

- ODroid XU-4, kernel 3.10.96
- Firefly RK3288, kernel 3.10.0
- Raspberry Pi 2, kernel 4.4.8

Any ideas about this problem?

Thanks,

Simon



More information about the linux-arm-kernel mailing list