[PATCHv2 09/11] arm64: entry: fix non-NMI kernel<->kernel transitions

Zenghui Yu yuzenghui at huawei.com
Sun Apr 25 06:29:31 BST 2021


Hi Mark,

On 2020/11/30 19:59, Mark Rutland wrote:
> There are periods in kernel mode when RCU is not watching and/or the
> scheduler tick is disabled, but we can still take exceptions such as
> interrupts. The arm64 exception handlers do not account for this, and
> it's possible that RCU is not watching while an exception handler runs.
> 
> The x86/generic entry code handles this by ensuring that all (non-NMI)
> kernel exception handlers call irqentry_enter() and irqentry_exit(),
> which handle RCU, lockdep, and IRQ flag tracing. We can't yet move to
> the generic entry code, and already hadnle the user<->kernel transitions
> elsewhere, so we add new kernel<->kernel transition helpers alog the
> lines of the generic entry code.
> 
> Since we now track interrupts becoming masked when an exception is
> taken, local_daif_inherit() is modified to track interrupts becoming
> re-enabled when the original context is inherited. To balance the
> entry/exit paths, each handler masks all DAIF exceptions before
> exit_to_kernel_mode().
> 
> Signed-off-by: Mark Rutland <mark.rutland at arm.com>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: James Morse <james.morse at arm.com>
> Cc: Will Deacon <will at kernel.org>

[...]

> +/*
> + * This is intended to match the logic in irqentry_enter(), handling the kernel
> + * mode transitions only.
> + */
> +static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
> +{
> +	regs->exit_rcu = false;
> +
> +	if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
> +		lockdep_hardirqs_off(CALLER_ADDR0);
> +		rcu_irq_enter();
> +		trace_hardirqs_off_finish();
> +
> +		regs->exit_rcu = true;
> +		return;
> +	}
> +
> +	lockdep_hardirqs_off(CALLER_ADDR0);
> +	rcu_irq_enter_check_tick();
> +	trace_hardirqs_off_finish();
> +}

Booting a lockdep-enabled kernel with "irqchip.gicv3_pseudo_nmi=1" would
result in splats as below:

| DEBUG_LOCKS_WARN_ON(!irqs_disabled())
| WARNING: CPU: 3 PID: 125 at kernel/locking/lockdep.c:4258 
lockdep_hardirqs_off+0xd4/0xe8
| Modules linked in:
| CPU: 3 PID: 125 Comm: modprobe Tainted: G        W         5.12.0-rc8+ 
#463
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : lockdep_hardirqs_off+0xd4/0xe8
| lr : lockdep_hardirqs_off+0xd4/0xe8
| sp : ffff80002a39bad0
| pmr_save: 000000e0
| x29: ffff80002a39bad0 x28: ffff0000de214bc0
| x27: ffff0000de1c0400 x26: 000000000049b328
| x25: 0000000000406f30 x24: ffff0000de1c00a0
| x23: 0000000020400005 x22: ffff8000105f747c
| x21: 0000000096000044 x20: 0000000000498ef9
| x19: ffff80002a39bc88 x18: ffffffffffffffff
| x17: 0000000000000000 x16: ffff800011c61eb0
| x15: ffff800011700a88 x14: 0720072007200720
| x13: 0720072007200720 x12: 0720072007200720
| x11: 0720072007200720 x10: 0720072007200720
| x9 : ffff80002a39bad0 x8 : ffff80002a39bad0
| x7 : ffff8000119f0800 x6 : c0000000ffff7fff
| x5 : ffff8000119f07a8 x4 : 0000000000000001
| x3 : 9bcdab23f2432800 x2 : ffff800011730538
| x1 : 9bcdab23f2432800 x0 : 0000000000000000
| Call trace:
|  lockdep_hardirqs_off+0xd4/0xe8
|  enter_from_kernel_mode.isra.5+0x7c/0xa8
|  el1_abort+0x24/0x100
|  el1_sync_handler+0x80/0xd0
|  el1_sync+0x6c/0x100
|  __arch_clear_user+0xc/0x90
|  load_elf_binary+0x9fc/0x1450
|  bprm_execve+0x404/0x880
|  kernel_execve+0x180/0x188
|  call_usermodehelper_exec_async+0xdc/0x158
|  ret_from_fork+0x10/0x18

The code that triggers the splat is lockdep_hardirqs_off+0xd4/0xe8:

|	/*
|	 * So we're supposed to get called after you mask local IRQs, but for
|	 * some reason the hardware doesn't quite think you did a proper job.
|	 */
|	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
|		return;

which looks like a false positive as DAIF are all masked on taking
an synchronous exception and hardirqs are therefore disabled. With
pseudo NMI used, irqs_disabled() takes the value of ICC_PMR_EL1 as
the interrupt enable state, which is GIC_PRIO_IRQON (0xe0) in this
case and doesn't help much. Not dig further though.


Thanks,
Zenghui



More information about the linux-arm-kernel mailing list