[PATCH 00/11] arm64: entry lockdep/rcu/tracing fixes

Mark Rutland mark.rutland at arm.com
Thu Nov 26 07:35:51 EST 2020


Hi,

Dmitry and Marco both reported some weirdness with lockdep on arm64 erroneously
reporting the hardware IRQ state, and inexplicable RCU stalls:

  https://lore.kernel.org/r/CACT4Y+aAzoJ48Mh1wNYD17pJqyEcDnrxGfApir=-j171TnQXhw@mail.gmail.com
  https://lore.kernel.org/r/20201119193819.GA2601289@elver.google.com

Having investigated, I believe that this is largely down to the arm64 entry
code not correctly managing RCU, lockdep, irq flag tracing, and context
tracking. This series attempts to fix those cases, and I've Cc'd folk from the
previous threads as a heads-up.

Today, the arm64 entry code:

* Doesn't correctly save/restore the lockdep/tracing view of the HW IRQ
  state, leaving this inconsistent.

* Doesn't correctly wake/sleep RCU arounds its use (e.g. by the IRQ tracing
  functions).

* Calls the context tracking functions (which wake and sleep RCU) at the wrong
  point w.r.t. lockdep, tracing.

Fixing all this requires reworking the entry/exit sequences along the lines of
the generic/x86 entry code. Moving arm64 over to the generic entry code
requires signficant changes to both arm64 and the generic code, so for now I've
added arm64-specific helpers to achieve the same thing. There's a lot of
cleanup we could do here as a follow-up, but for now I've tried to do the bare
minimum to make things work as expected without making it unmaintainable.

The patches are based on v5.10-rc3, and I've pushed them out to my
arm64/entry-fixes branch on kernel.org:

  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/entry-fixes

Marco was able to test a WIP version of this, which seemed to address the
issues he was seeing. Since then I've had to alter the debug exception
handling, but I'm not expecting problems there. In future we'll want to make
more changes to the debug cases to align with x86, handling single-step,
watchpoints, and breakpoints as NMIs, but this will require significant
refactoring of the way we handle BRKs. For now I don't believe that there's a
major problem in practice with the approach taken in this series.

This version has seen an overnight soak under Syzkaller, where all the reports
I have so far look sound. I have been testing with additional debug patches:
  
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/entry-fixes

... which I do not think we should merge now, but intent to respin in future
with all the other cleanup.

While investigating this Peter and I spotted a latent issue in the core idle
code, for which peter has a patch queued in the tip locking/urgent branch:

  https://lore.kernel.org/r/20201120114925.594122626@infradead.org

... which the second patch in this series refers to.

Thanks,
Mark.

Mark Rutland (11):
  arm64: syscall: exit userspace before unmasking exceptions
  arm64: mark idle code as noinstr
  arm64: entry: mark entry code as noinstr
  arm64: entry: move enter_from_user_mode to entry-common.c
  arm64: entry: prepare ret_to_user for function call
  arm64: entry: move el1 irq/nmi logic to C
  arm64: entry: fix non-NMI user<->kernel transitions
  arm64: ptrace: prepare for EL1 irq/rcu tracking
  arm64: entry: fix non-NMI kernel<->kernel transitions
  arm64: entry: fix NMI {user,kernel}->kernel transitions
  arm64: entry: fix EL1 debug transitions

 arch/arm64/include/asm/daifflags.h |   3 +
 arch/arm64/include/asm/exception.h |   5 +
 arch/arm64/include/asm/ptrace.h    |   7 ++
 arch/arm64/kernel/entry-common.c   | 246 +++++++++++++++++++++++++++----------
 arch/arm64/kernel/entry.S          |  78 ++++--------
 arch/arm64/kernel/irq.c            |  15 ---
 arch/arm64/kernel/process.c        |   8 +-
 arch/arm64/kernel/sdei.c           |   7 +-
 arch/arm64/kernel/syscall.c        |   1 -
 arch/arm64/kernel/traps.c          |  22 ++--
 arch/arm64/mm/fault.c              |  25 ----
 11 files changed, 237 insertions(+), 180 deletions(-)

-- 
2.11.0




More information about the linux-arm-kernel mailing list