[RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature

Suzuki K Poulose suzuki.poulose at arm.com
Thu May 22 09:29:16 PDT 2025


On 22/05/2025 16:23, Catalin Marinas wrote:
> Hi Suzuki,
> 
> Thanks for looking at this.
> 
> On Mon, May 19, 2025 at 10:45:31AM +0100, Suzuki K Poulose wrote:
>> On 14/05/2025 13:05, Catalin Marinas wrote:
>>> On Tue, May 13, 2025 at 10:15:49AM +0100, Suzuki K Poulose wrote:
>>>> On 12/05/2025 17:33, Catalin Marinas wrote:
>>>>> Stepping back a bit, we know that the MIDR allow-list implies
>>>>> BBML2_NOABORT (and at least BBML2 as in the ID regs). In theory, we need
>>>>
>>>> Please be aware that BBML2_NOABORT midr list may not always imply BBLM2 in
>>>> ID registers (e.g., AmpereOne. But the plan is to fixup the per cpu
>>>> ID register - struct cpuinfo_arm64 - for such cores at early boot,
>>>> individually, before it is used for sanitisation of the system wide
>>>> copy).
>>>
>>> Ah, good point. We can then ignore BBML2 ID regs and only rely on MIDR
>>> (and some future BBML3).
>>>
>>>>> So how about we introduce a WEAK_BOOT_CPU_FEATURE which gets enabled by
>>>>> the boot CPU if it has it _but_ cleared by any secondary early CPU if it
>>>>> doesn't (and never enabled by secondary CPUs). When the features are
>>>>> finalised, we know if all early CPUs had it. In combination with
>>>>> PERMITTED_FOR_LATE_CPU, we'd reject late CPUs that don't have it.
>>>>
>>>> That could work, but it introduces this "clearing" a capability, which
>>>> we don't do at the moment.
>>>>
>>>> We had an offline discussion about this some time ago, with Mark
>>>> Rutland. The best way to deal with this is to change the way we compute
>>>> capabilities. i.e.,
>>>>
>>>>
>>>> 1. Each boot CPU run through all the capabilities and maintain a per-cpu
>>>>      copy of the state.
>>>> 2. System wide capabilities can then be constructed from the all early
>>>>      boot CPU capability state (e.g., ANDing all the state from all CPUs
>>>>      for SCOPE_SYSTEM or ORing for LOCAL_CPU).
>>>>
>>>> But this requires a drastic change to the infrastructure.
>>>
>>> I think it's a lot simpler to achieve the ANDing - set the (system)
>>> capability if detected on the boot CPU, only clear it if missing on
>>> subsequent CPUs. See below on an attempt to introduce this. For lack of
>>> inspiration, I called it ARM64_CPUCAP_GLOBAL_CPU_FEATURE which has both
>>> SCOPE_LOCAL and SCOPE_SYSTEM. It's permitted for late CPUs but not
>>> optional if already enabled. The advantage of having both local&system
>>> is that the match function will be called for both scopes. I added a
>>> mask in to cpucap_default_scope() when calling matches() since so far
>>> no cap had both.
>>
>> Thanks, the change below does the trick. I am reasoning with the way
>> the scope has been defined (hacked ;-)).
>>
>> SCOPE_LOCAL_CPU && SCOPE_SYSTEM
>>
>> 1. SCOPE_LOCAL_CPU : Because you need to run this on all the (early) CPUs.
>>
>> 2. SCOPE_SYSTEM: To check if the capability holds at the end of the
>> smp boot.
>>
>> While, we really "detect" it on SCOPE_BOOT_CPU and only run the
>> cap checks, if that is available. But put another way, BOOT_CPU
>> is only used as an easy way to detect if this CPUs is the first
>> one to run the check vs at least one CPU has run and cleared the
>> cap.
> 
> Yes, we start with boot CPU and keep 'and-ing' new values onto it.
> 
>> I wonder if we could use some other flag to indicate the fact that,
>> a non-boot CPU is allowed to clear the capability explicitly, rather than
>> implying it with SCOPE_SYSTEM && SCOPE_LOCAL_CPU. Or may be make
>> it explicit that the capability must be matched on ALL cpus and
>> finalized at the end ?
> 
> I had such variant locally but then decided to reuse the SCOPE_SYSTEM
> for this, more of a way to indicate that we want something system-wide
> but checked per-CPU. We could add a new flag, though I was wondering
> whether we can have a property that's checked both per-CPU and once more
> system-wide. That's what actually happens with the above, then the probe
> function can tell whether it was called in the CPU or system scope.
> 
> Alternatively, we can leave the local/system combining for later and
> only add a flag to tell how they compose - "any" (default) vs "all".
> 
>> /*
>>   * When paired with SCOPE_LOCAL_CPU, all CPUs must satisfy the
>>   * condition. This is different from SCOPE_SYSTEM, where the check
>>   * is performed only once at the end of SMP boot. But SCOPE_SYSTEM
>>   * may not be sufficient in cases where the capability depends on
>>   * properties that are not "sanitised" (e.g., MIDR_EL1) and must be
>>   * satisfied by all the early SMP boot CPUs.
>>   */
>> #define ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS	((u16)BIT(7))
>>
>> statici inline bool cpucap_match_all_cpus(struct arm64_capability *cap)
>> {
>> 	return !!(cap->type & ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS);
>> }
> 
> Yes, something like this would work.
> 
>> Also, we already go through the capablity list to report the ones
>> with "cpumask" separately, and we could use that to also report
>> the ones with MATCH_ALL_CPUs. Something like:
>>
>>
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 9c4d6d552b25..14cbae51d802 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -3769,10 +3769,15 @@ static void __init setup_system_capabilities(void)
>>          for (int i = 0; i < ARM64_NCAPS; i++) {
>>                  const struct arm64_cpu_capabilities *caps = cpucap_ptrs[i];
>>
>> -		if (caps && caps->cpus && caps->desc &&
>> -			cpumask_any(caps->cpus) < nr_cpu_ids)
>> +		if (!caps || !caps->desc)
>> +			continue;
>> +
>> +		if (caps->cpus && cpumask_any(caps->cpus) < nr_cpu_ids)
>> 			pr_info("detected: %s on CPU%*pbl\n",
>> 				caps->desc, cpumask_pr_args(caps->cpus));
>> +
>> +		/* Report capabilities that had to be matched on all CPUs */
>> +		if (capcpucap_match_all_cpus(caps) && cpus_have_cap(caps))
>> +			pr_info("detected: %s\n", caps->desc);
>>          }
> 
> Yeah, I hacked something similar with the 'global' proposal based on
> SCOPE_SYSTEM.
> 
>>> ---------------------8<-----------------------------------------
>>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>>> index c4326f1cb917..0b0b26a6f27b 100644
>>> --- a/arch/arm64/include/asm/cpufeature.h
>>> +++ b/arch/arm64/include/asm/cpufeature.h
>>> @@ -331,6 +331,15 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;
>>>    #define ARM64_CPUCAP_BOOT_CPU_FEATURE                  \
>>>    	(ARM64_CPUCAP_SCOPE_BOOT_CPU | ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU)
>>> +/*
>>> + * CPU feature detected at boot time based on all CPUs. It is safe for a late
>>> + * CPU to have this feature even though the system hasn't enabled it, although
>>> + * the feature will not be used by Linux in this case. If the system has
>>> + * enabled this feature already, then every late CPU must have it.
>>> + */
>>> +#define ARM64_CPUCAP_GLOBAL_CPU_FEATURE			
>>
>> #define ARM64_CPUCAP_MATCH_ALL_CPU_FEATURE ?
>>
>> \
>>> +	 (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_SYSTEM_FEATURE)
>>
>>    (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS)
>>
>>
>>> +
>>>    struct arm64_cpu_capabilities {
>>>    	const char *desc;
>>>    	u16 capability;
>>> @@ -391,6 +400,11 @@ static inline int cpucap_default_scope(const struct arm64_cpu_capabilities *cap)
>>>    	return cap->type & ARM64_CPUCAP_SCOPE_MASK;
>>>    }
>>> +static inline bool cpucap_global_scope(const struct arm64_cpu_capabilities *cap)
>>
>> May be call it cpucap_match_all_cpus() ?
> 
> I can respin, the alternative looks good to me.
> 
> Now, we discussed offline of a different approach: for AmpereOne we'll
> have to check MIDR early (as an erratum) and pretend it has BBML2,
> populate the sanitised cpuid regs accordingly. We could do something
> similar for the other CPUs, pretend it's something like BBML3 and get
> the architects to commit to it (but this would delay the patchset).
> 
> TBH, I'd rather not hack this and only rely on the MIDR for BBM_NOABORT
> (without any level) and the above MATCH_ALL_CPUS. My proposal is to
> respin this with a MATCH_ALL_CPUS flag that only checks the MIDR. We can
> later add a SCOPE_SYSTEM to the same capability that would 'or' in the
> BBML3 cap (or just use two capabilities, though we end up with two many
> branches or NOPs in the patched alternatives).

Sounds good to me. Thanks for taking care of this.

Cheers
Suzuki

> 




More information about the linux-arm-kernel mailing list