[PATCH v3 0/6] Sparse HART id support

Palmer Dabbelt palmer at dabbelt.com
Thu Jan 20 10:17:22 PST 2022


On Thu, 20 Jan 2022 01:09:12 PST (-0800), Atish Patra wrote:
> Currently, sparse hartid is not supported for Linux RISC-V for the following
> reasons.
> 1. Both spinwait and ordered booting method uses __cpu_up_stack/task_pointer
>    which is an array size of NR_CPUs.
> 2. During early booting, any hartid greater than NR_CPUs are not booted at all.
> 3. riscv_cpuid_to_hartid_mask uses struct cpumask for generating hartid bitmap.
> 4. SBI v0.2 implementation uses NR_CPUs as the maximum hartid number while
>    generating hartmask.
>
> In order to support sparse hartid, the hartid & NR_CPUS needs to be disassociated
> which was logically incorrect anyways. NR_CPUs represent the maximum logical|
> CPU id configured in the kernel while the hartid represent the physical hartid
> stored in mhartid CSR defined by the privilege specification. Thus, hartid
> can have much greater value than logical cpuid.
>
> Currently, we have two methods of booting. Ordered booting where the booting
> hart brings up each non-booting hart one by one using SBI HSM extension.
> The spinwait booting method relies on harts jumping to Linux kernel randomly
> and boot hart is selected by a lottery. All other non-booting harts keep
> spinning on __cpu_up_stack/task_pointer until boot hart initializes the data.
> Both these methods rely on __cpu_up_stack/task_pointer to setup the stack/
> task pointer. The spinwait method is mostly used to support older firmwares
> without SBI HSM extension and M-mode Linux.  The ordered booting method is the
> preferred booting method for booting general Linux because it can support
> cpu hotplug and kexec.
>
> The first patch modified the ordered booting method to use an opaque parameter
> already available in HSM start API to setup the stack/task pointer. The third
> patch resolves the issue #1 by limiting the usage of
> __cpu_up_stack/task_pointer to spinwait specific booting method. The fourth
> and fifth patch moves the entire hart lottery selection and spinwait method
> to a separate config that can be disabled if required. It solves the issue #2.
> The 6th patch solves issue #3 and #4 by removing riscv_cpuid_to_hartid_mask
> completely. All the SBI APIs directly pass a pointer to struct cpumask and
> the SBI implementation takes care of generating the hart bitmap from the
> cpumask.
>
> It is not trivial to support sparse hartid for spinwait booting method and
> there are no usecases to support sparse hartid for spinwait method as well.
> Any platform with sparse hartid will probably require more advanced features
> such as cpu hotplug and kexec. Thus, the series supports the sparse hartid via
> ordered booting method only. To maintain backward compatibility, spinwait
> booting method is currently enabled in defconfig so that M-mode linux will
> continue to work. Any platform that requires to sparse hartid must disable the
> spinwait method.
>
> This series also fixes the out-of-bounds access error[1] reported by Geert.
> The issue can be reproduced with SMP booting with NR_CPUS=4 on platforms with
> discontiguous hart numbering (HiFive unleashed/unmatched & polarfire).
> Spinwait method should also be disabled for such configuration where NR_CPUS
> value is less than maximum hartid in the platform.
>
> [1] https://lore.kernel.org/lkml/CAMuHMdUPWOjJfJohxLJefHOrJBtXZ0xfHQt4=hXpUXnasiN+AQ@mail.gmail.com/#t
>
> The series is based on queue branch on kvm-riscv as it has kvm related changes
> as well. I have tested it on HiFive Unmatched and Qemu.
>
> Changes from v2->v3:
> 1. Rebased on linux-next
> 2. Removed the redundant variable in PATCH 1.
> 3. Added the reviewed-by/acked-by tags.
>
> Changes from v1->v2:
> 1. Fixed few typos in Kconfig.
> 2. Moved the boot data structure offsets to a asm-offset.c
> 3. Removed the redundant config check in head.S
>
> Atish Patra (6):
> RISC-V: Avoid using per cpu array for ordered booting
> RISC-V: Do not print the SBI version during HSM extension boot print
> RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method
> RISC-V: Move the entire hart selection via lottery to SMP
> RISC-V: Move spinwait booting method to its own config
> RISC-V: Do not use cpumask data structure for hartid bitmap
>
> arch/riscv/Kconfig                   |  14 ++
> arch/riscv/include/asm/cpu_ops.h     |   2 -
> arch/riscv/include/asm/cpu_ops_sbi.h |  25 ++++
> arch/riscv/include/asm/sbi.h         |  19 +--
> arch/riscv/include/asm/smp.h         |   2 -
> arch/riscv/kernel/Makefile           |   3 +-
> arch/riscv/kernel/asm-offsets.c      |   3 +
> arch/riscv/kernel/cpu_ops.c          |  26 ++--
> arch/riscv/kernel/cpu_ops_sbi.c      |  26 +++-
> arch/riscv/kernel/cpu_ops_spinwait.c |  27 +++-
> arch/riscv/kernel/head.S             |  35 ++---
> arch/riscv/kernel/head.h             |   6 +-
> arch/riscv/kernel/sbi.c              | 189 +++++++++++++++------------
> arch/riscv/kernel/setup.c            |  10 --
> arch/riscv/kernel/smpboot.c          |   2 +-
> arch/riscv/kvm/mmu.c                 |   4 +-
> arch/riscv/kvm/vcpu_sbi_replace.c    |  11 +-
> arch/riscv/kvm/vcpu_sbi_v01.c        |  11 +-
> arch/riscv/kvm/vmid.c                |   4 +-
> arch/riscv/mm/cacheflush.c           |   5 +-
> arch/riscv/mm/tlbflush.c             |   9 +-
> 21 files changed, 253 insertions(+), 180 deletions(-)
> create mode 100644 arch/riscv/include/asm/cpu_ops_sbi.h

Thanks, these are on for-next.



More information about the linux-riscv mailing list