[PATCH v3 1/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls
Khaja Hussain Shaik Khaji
khaja.khaji at oss.qualcomm.com
Mon Apr 20 00:05:52 PDT 2026
On Mon, Mar 02, 2026 at 01:38:35PM +0000, Mark Rutland wrote:
> That suggests that something is going wrong *within* your entry handler
> that causes IRQs to be unmasked unexpectedly.
>
> Please can we find out *exactly* where IRQs get unmasked for the first
> time?
Thanks for the pointer -- that was the right direction to look.
You are correct. I confirmed that arm64_enter_el1_dbg() does NOT re-enable
IRQs; it only manages lockdep and context-tracking state. The IRQ unmask
originates entirely within our kretprobe entry_handler itself.
The exact call chain is:
pre_handler_kretprobe()
entry_dwc3_gadget_pullup() <- kretprobe entry_handler
dwc3_msm_notify_event()
_raw_spin_unlock_irq() <- first IRQ unmask (spin_unlock_irq)
dwc3_msm_notify_event() is called from within the entry_handler while
holding a spinlock acquired with spin_lock_irq() (i.e. IRQs were disabled
on lock, and re-enabled unconditionally on unlock via spin_unlock_irq /
_raw_spin_unlock_irq). This is the first point at which IRQs become
unmasked.
>From that point, a hardware IRQ fires, softirq processing runs, and
kprobe_flush_task() -> kprobe_busy_begin()/end() is invoked while the
kretprobe entry_handler is still on the stack -- triggering the cur_kprobe
corruption described in the patch.
Regarding documentation: the kprobes documentation in
Documentation/trace/kprobes.rst (section "Kretprobe entry-handler") does
not mention any restriction on enabling IRQs within an entry_handler. The
only constraint documented is:
"Probe handlers are run with preemption disabled or interrupt disabled,
which depends on the architecture and optimization state."
This is stated for kprobe/kretprobe handlers in general, but there is no
explicit warning that an entry_handler must not re-enable IRQs for arm64.
Given that entry_handlers are user-supplied callbacks, a note
here would help future users avoid this class of bug.
As for the fix itself: we plan to carry this as a downstream patch for our
platform. We are not planning to push it upstream at this time.
Thanks again for the detailed review.
Khaja
More information about the linux-arm-kernel
mailing list