[PATCH] riscv: drop __init from vec_check_unaligned_access_speed_all_cpus

Anirudh Srinivasan asrinivasan at oss.tenstorrent.com
Mon Jun 15 10:22:18 PDT 2026


Hi,

On Fri, Jun 12, 2026 at 4:23 PM Nam Cao <namcao at linutronix.de> wrote:
>
> Anirudh Srinivasan <asrinivasan at oss.tenstorrent.com> writes:
> > This function runs within a kthread and need not necessarily finish
> > before system finishes boot and free_initmem() unmaps the .init.text
> > section. This function makes calls to SBI for probing unaligned access
> > speed, and if this is slow for some reason (say some debug prints were
> > added to SBI), the kthread can still be running at this point and result
> > in an instruction page fault when trying to fetch from the freed region.
> ...
> > Drop __init from its signature so that this doesn't happen.
>
> That should work.
>
> But we should really take a step back and reconsider whether running the
> vector access speed probe in a kthread is really a good idea.
>
> We have a problem in the past that the kthread may not complete before
> user reads vdso, and user gets incorrect values. That was addressed by
> 5d15d2ad36b0 ("riscv: hwprobe: Fix stale vDSO data for late-initialized
> keys at boot") which complicates things.
>
> And now you discover another issue.
>
> The motivation for using a kthread is to avoid boot time slow down. But
> this has been bothering me for quite a while now, because I am not sure
> if using kthread really speeds things up. Sooner or later, the kthread
> has to run. If it runs before the kernel is done booting, then the boot
> is even slower due to overhead of the kthread. If it runs after the
> kernel finishes booting, then we run into these kinds of
> headache. Unfortunately I do not have a riscv cpu with vector to confirm
> my suspicion.
>
> Furthermore, the vector access speed probe takes the same amount of time
> as scalar access speed probe. The scalar one is done without any
> kthread, and no one ever complained about boot time issue (well, someone
> did complain but that has nothing to do with kthread. Their 64-core (?)
> system is slower because the probe was done serially, and we switched to
> parallel probe and it was fine).
>
> So I think we should really get rid of that kthread entirely, the
> headache is not worth. That also allows reverting 5d15d2ad36b0 ("riscv:
> hwprobe: Fix stale vDSO data for late-initialized keys at boot"), making
> the code simplier.
>
> Below is a patch that has only been tested with qemu. It reverts the
> mentioned commit and removes the kthread.
>

I tested your patch on Sifive X280 (RVV 1.0, Vlen 512) and a patched
version of Qemu (qemu doesn't emit traps on misaligned vector
load/stores, so I tried patching it to do that) and it seems to fix my
error in both cases.

Not sure if this is useful info about boot timings on the X280, but

Before your patch (vec_check_unaligned_access_speed_all_cpus runs in kthread)
[    0.393029] cpu1: scalar unaligned word access speed is 0.01x byte
access speed (slow)
[    0.393035] cpu2: scalar unaligned word access speed is 0.01x byte
access speed (slow)
[    0.393037] cpu0: scalar unaligned word access speed is 0.01x byte
access speed (slow)
[    0.393051] cpu3: scalar unaligned word access speed is 0.01x byte
access speed (slow)
[    0.417932] cpu2: vector unaligned word access speed is 0.00x byte
access speed (slow)
[    0.417937] cpu0: vector unaligned word access speed is 0.00x byte
access speed (slow)
[    0.417958] cpu1: vector unaligned word access speed is 0.00x byte
access speed (slow)
[    0.422175] cpu3: vector unaligned word access speed is 0.01x byte
access speed (slow)
[    0.423780] clk: Disabling unused clocks
[    0.427185] PM: genpd: Disabling unused power domains
[    0.458801] EXT4-fs (vda): recovery complete
[    0.459069] EXT4-fs (vda): mounted filesystem
2c88c5b4-f269-4b04-a4c4-2e7270c53919 r/w with ordered data mode. Quota
mode: disabled.
[    0.459146] VFS: Mounted root (ext4 filesystem) on device 254:0.
[    0.461981] devtmpfs: mounted
[    0.462101] VFS: Pivoted into new rootfs
[    0.464377] Freeing unused kernel image (initmem) memory: 2264K
[    0.464474] Run /sbin/init as init process

After your patch (vec_check_unaligned_access_speed_all_cpus doesn't
run in kthread)
[    0.393026] cpu0: scalar unaligned word access speed is 0.01x byte
access speed (slow)
[    0.393027] cpu1: scalar unaligned word access speed is 0.01x byte
access speed (slow)
[    0.393029] cpu3: scalar unaligned word access speed is 0.01x byte
access speed (slow)
[    0.393050] cpu2: scalar unaligned word access speed is 0.01x byte
access speed (slow)
[    0.417781] cpu1: vector unaligned word access speed is 0.00x byte
access speed (slow)
[    0.417784] cpu2: vector unaligned word access speed is 0.00x byte
access speed (slow)
[    0.417797] cpu3: vector unaligned word access speed is 0.00x byte
access speed (slow)
[    0.417803] cpu0: vector unaligned word access speed is 0.00x byte
access speed (slow)
[    0.433944] clk: Disabling unused clocks
[    0.433962] PM: genpd: Disabling unused power domains
[    0.446730] EXT4-fs (vda): recovery complete
[    0.447071] EXT4-fs (vda): mounted filesystem
2c88c5b4-f269-4b04-a4c4-2e7270c53919 r/w with ordered data mode. Quota
mode: disabled.
[    0.447134] VFS: Mounted root (ext4 filesystem) on device 254:0.
[    0.451719] devtmpfs: mounted
[    0.451840] VFS: Pivoted into new rootfs
[    0.455658] Freeing unused kernel image (initmem) memory: 2264K
[    0.457991] Run /sbin/init as init process

I think the difference seems too small here to draw any conclusions.
This is only a 4 core CPU.



More information about the linux-riscv mailing list