[PATCH] RISC-V: Dynamically allocate cpumasks and further increase range and default value of NR_CPUS

Tue Sep 17 07:20:12 PDT 2024

On Mon, 05 Aug 2024 01:58:54 PDT (-0700), liuyuntao12 at huawei.com wrote:
> Gentle ping

I think we just need to see some results for real hardware, as QEMU 
isn't meaningful for benchmarks like this.  My guess is we're not going 
to have an answer for a while, RISC-V really isn't anywhere close to 
having systems of this complexity yet.  So for now I think we should 
just leave the defaults alone, if hardware shows up where it makes sense 
to star changing things then we can take a look again.

>
> On 2024/6/26 20:41, liuyuntao (F) wrote:
>>
>>
>> On 2024/6/25 19:44, liuyuntao (F) wrote:
>>>
>>>
>>> On 2024/6/25 19:11, Andrew Jones wrote:
>>>> On Fri, Jun 14, 2024 at 07:53:06AM GMT, Yuntao Liu wrote:
>>>>> Currently default NR_CPUS is 64 for riscv64, since the latest QEMU virt
>>>>> machine supports up to 512 CPUS, so set default NR_CPUS 512 for
>>>>> riscv64.
>>>>>
>>>>> Under the promotion of RISC-V International and related chip
>>>>> manufacturers, RISC-V has also begun to enter the server market, which
>>>>> demands higher performance. Other major architectures (such as ARM64,
>>>>> x86_64, MIPS, etc) already have a higher range, so further increase
>>>>> this range up to 4096 for riscv64.
>>>>>
>>>>> Due to the fact that increasing NR_CPUS enlarges the size of cpumasks,
>>>>> there is a concern that this could significantly impact stack usage,
>>>>> especially for code that allocates cpumasks on the stack. To address
>>>>> this, we have the option to enable CPUMASK_OFFSTACK, which prevents
>>>>> cpumasks from being allocated on the stack. we choose to enable this
>>>>> feature only when NR_CPUS is greater than 512, why 512, since then
>>>>> the kernel size with offstack is smaller.
>>>>
>>>> This isn't the reason why Arm decided to start at 512, afaict. The
>>>> reason
>>>> for Arm was because hackbench did better with onstack for 256. What are
>>>> the hackbench results for riscv?
>>>
>>> Okay, I will add the test results of hacktest soon.
>>
>> Benchmark results using hackbench average over 5 runs of
>> ./hackbench -s 512 -l 20 -g 10 -f 50 -P
>> on Qemu.
>>
>> NR_CPUS     64      128     256     512     1024    2048
>> onstack/s   6.9992  6.6112  6.7834  6.6578  6.6646  6.8692
>> offstack/s  6.5616  6.95    6.5698  6.91    6.663   6.8202
>> difference  -6.25%  +5.12%  -3.15%  +3.79%  -0.02%  -0.71%
>>
>> When there are more cores, the fluctuation is minimal, leading to the
>> speculation that the performance gap would be smaller with a higher
>> number of NR_CPUS.
>> Since I don't have a RISCV single-board computer, these are the results
>> I obtained from testing in QEMU, which may differ from the actual
>> situation. Perhaps someone could help with the testing.
>>
>> Thanks,
>> Yuntao
>>
>>>
>>>>
>>>>>
>>>>> vmlinux size comparison(difference to vmlinux_onstack_NR_CPUS
>>>>> baseline):
>>>>>
>>>>> NR_CPUS     256         512         1024        2048        4096
>>>>> onstack     19814536    19840760    19880584    19969672    20141704
>>>>> offstack    19819144    19840936    19880480    19968544    20135456
>>>>> difference  +0.023%     +0.001%     -0.001%     -0.001      -0.031%
>>>>> is_smaller  n           n           y           y           y
>>>>
>>>> Since the savings are almost nothing we must not have too many global
>>>> cpumasks. But I'm in favor of ensuring stack depths stay under control,
>>>> so turning on CPUMASK_OFFSTACK sounds good to me in general.
>>>>
>>>>>
>>>>> Signed-off-by: Yuntao Liu <liuyuntao12 at huawei.com>
>>>>> ---
>>>>>   arch/riscv/Kconfig | 5 +++--
>>>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>>>> index 0525ee2d63c7..5960713b3bf9 100644
>>>>> --- a/arch/riscv/Kconfig
>>>>> +++ b/arch/riscv/Kconfig
>>>>> @@ -77,6 +77,7 @@ config RISCV
>>>>>       select CLINT_TIMER if RISCV_M_MODE
>>>>>       select CLONE_BACKWARDS
>>>>>       select COMMON_CLK
>>>>> +    select CPUMASK_OFFSTACK if NR_CPUS > 512
>>>>>       select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND
>>>>>       select EDAC_SUPPORT
>>>>>       select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER &&
>>>>> !DYNAMIC_FTRACE)
>>>>> @@ -428,11 +429,11 @@ config SCHED_MC
>>>>>   config NR_CPUS
>>>>>       int "Maximum number of CPUs (2-512)"
>>>>>       depends on SMP
>>>>> -    range 2 512 if !RISCV_SBI_V01
>>>>> +    range 2 4096 if !RISCV_SBI_V01
>>>>>       range 2 32 if RISCV_SBI_V01 && 32BIT
>>>>>       range 2 64 if RISCV_SBI_V01 && 64BIT
>>>>>       default "32" if 32BIT
>>>>> -    default "64" if 64BIT
>>>>> +    default "512" if 64BIT
>>>>
>>>> This is somewhat reasonable, even if nothing is going to use this for
>>>> quite a while, since it'll help avoid bugs popping up when NR_CPUS gets
>>>> bumped later, but it feels excessive right now for riscv, so I'm a bit
>>>> on the fence about it. Maybe if hackbench doesn't show any issues we
>>>> could turn CPUMASK_OFFSTACK on for a smaller NR_CPUS and also select
>>>> a smaller default?
>>>>
>>
>> It seems that when NR_CPUS is larger, hackbench performs better, and
>> which NR_CPUS do you have a preference for?
>>
>>>> Thanks,
>>>> drew
>>>>
>>>>>   config HOTPLUG_CPU
>>>>>       bool "Support for hot-pluggable CPUs"
>>>>> --
>>>>> 2.34.1
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> linux-riscv mailing list
>>>>> linux-riscv at lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv