[PATCH] riscv: Report error when repeatedly recording CPU hardware ID

qiaozhe qiaozhe at iscas.ac.cn
Mon Aug 26 18:39:30 PDT 2024


On 2024/8/26 16:15, Conor Dooley wrote:

> On Sat, Aug 24, 2024 at 10:41:28AM +0800, qiaozhe wrote:
>>
>>
>> On 2024/8/23 20:57, Conor Dooley wrote:
>>
>>> On Fri, Aug 23, 2024 at 05:11:00PM +0800, Zhe Qiao wrote:
>>>> In the of_parse_and_init_cpus() function, when the __cpuid_to_hartid_map[]
>>>> array records the CPU hardware ID, if the same CPU hardware attribute has 
>>>> been recorded, an error report is issued, thereby ensuring the uniqueness
>>>> of the CPU hardware ID recorded in the __cpuid_to_hartid_map[] array.
>>> Why is this actually required? On what system did you encounter this?
>> This is not actually a patch submitted for problems encountered in actual 
>> development environments, but rather a comparison of ARM architecture when 
>> I was learning Linux kernel and found similar judgments on ARM architecture.
> Okay, it's good that you didn't find such a bad dtb "in the wild" :)
>
>> In addition, if the same attribute exists on the CPU hardware ID and is 
>> recorded in __cpuid_to_hartid_map[], the kernel may need to make a judgment 
>> on this error.
>>
>>
>>>> Signed-off-by: Zhe Qiao <qiaozhe at iscas.ac.cn>
>>>> ---
>>>>  arch/riscv/kernel/smpboot.c | 16 ++++++++++++++++
>>>>  1 file changed, 16 insertions(+)
>>>>
>>>> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
>>>> index 0f8f1c95ac38..698f9fe791f7 100644
>>>> --- a/arch/riscv/kernel/smpboot.c
>>>> +++ b/arch/riscv/kernel/smpboot.c
>>>> @@ -118,6 +118,16 @@ static void __init acpi_parse_and_init_cpus(void)
>>>>  #define acpi_parse_and_init_cpus(...)	do { } while (0)
>>>>  #endif
>>>>  
>>>> +static bool __init is_mpidr_duplicate(unsigned int cpuid, u64 hart)
>>>> +{
>>>> +	unsigned int i;
>>>> +
>>>> +	for (i = 1; (i < cpuid) && (i < NR_CPUS); i++)
>>>> +		if (cpuid_to_hartid_map(i) == hart)
>>>> +			return true;
>>>> +	return false;
>>>> +}
>>>> +
>>>>  static void __init of_parse_and_init_cpus(void)
>>>>  {
>>>>  	struct device_node *dn;
>>>> @@ -131,6 +141,12 @@ static void __init of_parse_and_init_cpus(void)
>>>>  		if (rc < 0)
>>>>  			continue;
>>>>  
>>>> +		if (is_mpidr_duplicate(cpuid, hart)) {
>>>> +			pr_err("%pOF: duplicate cpu reg properties in the DT\n",
>>>> +				dn);
>>>> +			continue;
>>> Why would we continue in this case? If the devicetree is this broken,
>>> why shouldn't we just BUG() and abort immediately?
>> This is because I did not find any judgment on this issue in the previous code 
>> during the analysis process, so I did not take more aggressive measures in this 
>> regard, but only issued an error alarm.
> What do you think though? Should we continue to boot in this case?
> If you read the function a bit further, you'll see that we abort boot
> if there are two instances of the boot CPU. Do you think the same should
> be done for all CPUs?


Yes, I saw that if there are two boot CPUs, a BUG will occur. For all CPUs, 

if there are two CPUs with the same attributes, I think a bug should be generated 

directly. This will attract more attention than issuing false warnings, while also 

reducing the difficulty of troubleshooting related issues.


These are some of my opinions, I don't know if they are reasonable to consider. 

What do you think?


> Thanks,
> Conor.
>
>>>> +		}
>>>> +
>>>>  		if (hart == cpuid_to_hartid_map(0)) {
>>>>  			BUG_ON(found_boot_cpu);
>>>>  			found_boot_cpu = 1;
>>>> -- 
>>>> 2.43.0

Thanks,
Zhe




More information about the linux-riscv mailing list