[PATCHv2 09/11] arm64: entry: fix non-NMI kernel<->kernel transitions

Zenghui Yu yuzenghui at huawei.com
Mon Apr 26 14:39:04 BST 2021


Hi Mark,

On 2021/4/26 17:21, Mark Rutland wrote:
> On Sun, Apr 25, 2021 at 01:29:31PM +0800, Zenghui Yu wrote:
>> Hi Mark,
> 
> Hi Zenghui,

[...]

>> Booting a lockdep-enabled kernel with "irqchip.gicv3_pseudo_nmi=1" would
>> result in splats as below:
>>
>> | DEBUG_LOCKS_WARN_ON(!irqs_disabled())
>> | WARNING: CPU: 3 PID: 125 at kernel/locking/lockdep.c:4258
>> lockdep_hardirqs_off+0xd4/0xe8
>> | Modules linked in:
>> | CPU: 3 PID: 125 Comm: modprobe Tainted: G        W         5.12.0-rc8+
>> #463
>> | Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
>> | pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO BTYPE=--)
>> | pc : lockdep_hardirqs_off+0xd4/0xe8
>> | lr : lockdep_hardirqs_off+0xd4/0xe8
>> | sp : ffff80002a39bad0
>> | pmr_save: 000000e0
>> | x29: ffff80002a39bad0 x28: ffff0000de214bc0
>> | x27: ffff0000de1c0400 x26: 000000000049b328
>> | x25: 0000000000406f30 x24: ffff0000de1c00a0
>> | x23: 0000000020400005 x22: ffff8000105f747c
>> | x21: 0000000096000044 x20: 0000000000498ef9
>> | x19: ffff80002a39bc88 x18: ffffffffffffffff
>> | x17: 0000000000000000 x16: ffff800011c61eb0
>> | x15: ffff800011700a88 x14: 0720072007200720
>> | x13: 0720072007200720 x12: 0720072007200720
>> | x11: 0720072007200720 x10: 0720072007200720
>> | x9 : ffff80002a39bad0 x8 : ffff80002a39bad0
>> | x7 : ffff8000119f0800 x6 : c0000000ffff7fff
>> | x5 : ffff8000119f07a8 x4 : 0000000000000001
>> | x3 : 9bcdab23f2432800 x2 : ffff800011730538
>> | x1 : 9bcdab23f2432800 x0 : 0000000000000000
>> | Call trace:
>> |  lockdep_hardirqs_off+0xd4/0xe8
>> |  enter_from_kernel_mode.isra.5+0x7c/0xa8
>> |  el1_abort+0x24/0x100
>> |  el1_sync_handler+0x80/0xd0
>> |  el1_sync+0x6c/0x100
>> |  __arch_clear_user+0xc/0x90
>> |  load_elf_binary+0x9fc/0x1450
>> |  bprm_execve+0x404/0x880
>> |  kernel_execve+0x180/0x188
>> |  call_usermodehelper_exec_async+0xdc/0x158
>> |  ret_from_fork+0x10/0x18
>>
>> The code that triggers the splat is lockdep_hardirqs_off+0xd4/0xe8:
>>
>> |	/*
>> |	 * So we're supposed to get called after you mask local IRQs, but for
>> |	 * some reason the hardware doesn't quite think you did a proper job.
>> |	 */
>> |	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
>> |		return;
>>
>> which looks like a false positive as DAIF are all masked on taking
>> an synchronous exception and hardirqs are therefore disabled. With
>> pseudo NMI used, irqs_disabled() takes the value of ICC_PMR_EL1 as
>> the interrupt enable state, which is GIC_PRIO_IRQON (0xe0) in this
>> case and doesn't help much. Not dig further though.
> 
> Thanks for this report. I think I understand the problem.
> 
> In some paths (e.g. el1_dbg, el0_svc) we update the PMR with
> (GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET) before we notify lockdep, but in
> others (e.g. el1_abort) we do not. The case where we do not are where
> lockdep will warn, since IRQs will be masked by DAIF but not the PMR, as
> you describe above.
> 
> With the current PMR management scheme, we'll need to consistently
> update the PMR earlier in the entry code. Does the below diff help?

[...]

> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 6acfc5e6b5e0..7d46c74a8706 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -292,6 +292,8 @@ alternative_else_nop_endif
>  alternative_if ARM64_HAS_IRQ_PRIO_MASKING
>  	mrs_s	x20, SYS_ICC_PMR_EL1
>  	str	x20, [sp, #S_PMR_SAVE]
> +	orr	x20, x20, #GIC_PRIO_PSR_I_SET
> +	msr_s	SYS_ICC_PMR_EL1, x20
>  alternative_else_nop_endif

While this does fix the lockdep part, it breaks something else. The
sleep-in-atomic one stands out (which says, I've seen other splats
triggered with this diff), where irqs_disabled() in do_mem_abort() now
gets confused by the updated PMR (GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET).

Please have a look.


Thanks,
Zenghui

| BUG: sleeping function called from invalid context at 
arch/arm64/mm/fault.c:582
| in_atomic(): 0, irqs_disabled(): 16, non_block: 0, pid: 512, name: sh
| CPU: 2 PID: 512 Comm: sh Tainted: G S      W         5.12.0+
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| Call trace:
|  dump_backtrace+0x0/0x210
|  show_stack+0x2c/0x38
|  dump_stack+0x150/0x1c4
|  ___might_sleep+0x154/0x250
|  __might_sleep+0x58/0x90
|  do_page_fault+0x24c/0x498
|  do_mem_abort+0x50/0xc0
|  el1_abort+0x50/0x100
|  el1_sync_handler+0x80/0xd0
|  el1_sync+0x74/0x100
|  schedule_tail+0xa0/0xd0
|  ret_from_fork+0x4/0x18



More information about the linux-arm-kernel mailing list