Out-of-bounds access when hartid >= NR_CPUS

Tue Oct 26 01:55:09 PDT 2021

On Mon, Oct 25, 2021 at 8:54 AM Geert Uytterhoeven <geert at linux-m68k.org> wrote:
>
> Hi all,
>
> When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
> the 4th CPU either fails to come online, or the system crashes.
>
> This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
> and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
>   - unused core has hartid 0 (sifive,e51),
>   - processor 0 has hartid 1 (sifive,u74-mc),
>   - processor 1 has hartid 2 (sifive,u74-mc),
>   - processor 2 has hartid 3 (sifive,u74-mc),
>   - processor 3 has hartid 4 (sifive,u74-mc).
>
> I assume the same issue is present on the SiFive fu540 and fu740
> SoCs, but I don't have access to these.  The issue is not present
> on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
> hartid 0.
>
> arch/riscv/kernel/cpu_ops.c has:
>
>     void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
>     void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
>
>     void cpu_update_secondary_bootdata(unsigned int cpuid,
>                                        struct task_struct *tidle)
>     {
>             int hartid = cpuid_to_hartid_map(cpuid);
>
>             /* Make sure tidle is updated */
>             smp_mb();
>             WRITE_ONCE(__cpu_up_stack_pointer[hartid],
>                        task_stack_page(tidle) + THREAD_SIZE);
>             WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
>
> The above two writes cause out-of-bound accesses beyond
> __cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.
>
>     }
>

Thanks for reporting this. We need to fix this and definitely shouldn't hide it
using configs. I guess I never tested with lower values (2 or 4) for
CONFIG_NR_CPUS which explains how this bug was not noticed until now.

> arch/riscv/kernel/smpboot.c:setup_smp(void) detects CPUs like this:
>
>     for_each_of_cpu_node(dn) {
>             hart = riscv_of_processor_hartid(dn);
>             if (hart < 0)
>                     continue;
>
>             if (hart == cpuid_to_hartid_map(0)) {
>                     BUG_ON(found_boot_cpu);
>                     found_boot_cpu = 1;
>                     early_map_cpu_to_node(0, of_node_to_nid(dn));
>                     continue;
>             }
>             if (cpuid >= NR_CPUS) {
>                     pr_warn("Invalid cpuid [%d] for hartid [%d]\n",
>                             cpuid, hart);
>                     break;
>             }
>
>             cpuid_to_hartid_map(cpuid) = hart;
>             early_map_cpu_to_node(cpuid, of_node_to_nid(dn));
>             cpuid++;
>     }
>
> So cpuid >= CONFIG_NR_CPUS (too many CPU cores) is already rejected.
>
> How to fix this?
>
> We could skip hartids >= NR_CPUS, but that feels strange to me, as
> you need NR_CPUS to be larger (much larger if the first usable hartid
> is a large number) than the number of CPUs used.
>
> We could store the minimum hartid, and always subtract that when
> accessing __cpu_up_{stack,pointer}_pointer[] (also in
> arch/riscv/kernel/head.S), but that means unused cores cannot be in the
> middle of the hartid range.

Yeah. Both of the above proposed solutions are not ideal.

>
> Are hartids guaranteed to be continuous? If not, we have no choice but
> to index __cpu_up_{stack,pointer}_pointer[] by cpuid instead, which
> needs a more expensive conversion in arch/riscv/kernel/head.S.
>

This will work for ordered booting with SBI HSM extension. However, it may
fail for spinwait booting because cpuid_to_hartid_map might not have setup
depending on when secondary harts are jumping to linux.

Ideally, the size of the __cpu_up_{stack,task}_pointer[] should be the maximum
hartid possible. How about adding a config for that ?

We also need sanity checks cpu_update_secondary_bootdata to make sure
that the hartid is within the bounds to avoid issues due to the
suboptimal config value.

> Thanks for your comments!
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

-- 
Regards,
Atish