[PATCH v8 15/19] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system

Qais Yousef qais.yousef at arm.com
Fri Jun 4 05:14:29 PDT 2021


On 06/04/21 10:49, Mark Rutland wrote:
> On Thu, Jun 03, 2021 at 06:40:57PM +0100, Will Deacon wrote:
> > On Thu, Jun 03, 2021 at 01:58:56PM +0100, Mark Rutland wrote:
> > > On Wed, Jun 02, 2021 at 05:47:15PM +0100, Will Deacon wrote:
> > > > If we want to support 32-bit applications, then when we identify a CPU
> > > > with mismatched 32-bit EL0 support we must ensure that we will always
> > > > have an active 32-bit CPU available to us from then on. This is important
> > > > for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
> > > > CPUs for compat tasks and forced migration due to a hotplug event will
> > > > hang if no 32-bit CPUs are available.
> > > > 
> > > > On detecting a mismatch, prevent offlining of either the mismatching CPU
> > > > if it is 32-bit capable, or find the first active 32-bit capable CPU
> > > > otherwise.
> > > > 
> > > > Reviewed-by: Catalin Marinas <catalin.marinas at arm.com>
> > > > Signed-off-by: Will Deacon <will at kernel.org>
> > > > ---
> > > >  arch/arm64/kernel/cpufeature.c | 20 +++++++++++++++++++-
> > > >  1 file changed, 19 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > > index 4194a47de62d..b31d7a1eaed6 100644
> > > > --- a/arch/arm64/kernel/cpufeature.c
> > > > +++ b/arch/arm64/kernel/cpufeature.c
> > > > @@ -2877,15 +2877,33 @@ void __init setup_cpu_features(void)
> > > >  
> > > >  static int enable_mismatched_32bit_el0(unsigned int cpu)
> > > >  {
> > > > +	static int lucky_winner = -1;
> > > 
> > > This is cute, but could we please give it a meaningful name, e.g.
> > > `pinned_cpu` ?
> > 
> > I really don't see the problem, nor why it's "cute".
> > 
> > Tell you what, I'll add a comment instead:
> > 
> > 	/*
> > 	 * The first 32-bit-capable CPU we detected and so can no longer
> > 	 * be offlined by userspace. -1 indicates we haven't yet onlined
> > 	 * a 32-bit-capable CPU.
> > 	 */
> 
> Thanks for the comment; that's helpful.
> 
> However, my concern here is that when we inevitably have to discuss this
> with others in future, "lucky winner" is jarring (and also unclear to
> those where English is not their native language). For clarity, it would
> be really nice to use a term like "cpu", "chosen_cpu", "pinned_cpu",
> etc.
> 
> However, you're the maintainer; choose what you think is appropriate.
> 
> > > >  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
> > > >  	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
> > > >  
> > > >  	if (cpu_32bit) {
> > > >  		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
> > > >  		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
> > > > -		setup_elf_hwcaps(compat_elf_hwcaps);
> > > >  	}
> > > >  
> > > > +	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
> > > > +		return 0;
> > > > +
> > > > +	if (lucky_winner >= 0)
> > > > +		return 0;
> > > > +
> > > > +	/*
> > > > +	 * We've detected a mismatch. We need to keep one of our CPUs with
> > > > +	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
> > > > +	 * every CPU in the system for a 32-bit task.
> > > > +	 */
> > > > +	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
> > > > +							 cpu_active_mask);
> > > > +	get_cpu_device(lucky_winner)->offline_disabled = true;
> > > > +	setup_elf_hwcaps(compat_elf_hwcaps);
> > > > +	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
> > > > +		cpu, lucky_winner);
> > > >  	return 0;
> > > >  }
> > > 
> > > I guess this is going to play havoc with kexec and hibernate. :/
> > 
> > The kernel can still offline the CPUs (see the whole freezer mess that I
> > linked to in the cover letter). What specific havoc are you thinking of?
> 
> Ah. If this is just inhibiting userspace-driven offlining, that sounds
> fine.
> 
> For kexec, I was concerned that either this would inhibit kexec, or
> smp_shutdown_nonboot_cpus() would fail to offline the pinned CPU, and
> that'd trigger a BUG(), which would be unfortunate.
> 
> For hibernate, the equivalent is freeze_secondary_cpus(), which I guess
> is dealt with by the freezer bits you mention.

()->offline_disabled will only block offline requests performed by
device_offline(). kexec, hibernate, suspend/resume use cpu_online/offline()
directly so won't be impacted by that. I have sent patches that make
cpu_online/offline() 'private' and not used or exported outside of cpu
subsystem and the odd support function for arch code. All other users use
device_offline() or add/remove_cpu() now.

I have made sure to test kexec, suspend to disk and ram in my older similar
implementation in the past. So we should be good.

Cheers

--
Qais Yousef



More information about the linux-arm-kernel mailing list