[PATCH v3] riscv: hwprobe: Fix stale vDSO data for late-initialized keys at boot

Tue Jun 24 02:54:39 PDT 2025

Hi Alexandre, Palmer,

On 2025/6/23 20:15, Alexandre Ghiti wrote:
> Hi Jingwei, Palmer,
>
> On 6/20/25 10:43, Jingwei Wang wrote:
>>
>> Hi Palmer,
>> On 2025/6/11 06:25, Palmer Dabbelt wrote:
>>> On Wed, 28 May 2025 07:28:19 PDT (-0700), wangjingwei at iscas.ac.cn wrote:
>>>> The riscv_hwprobe vDSO data is populated by init_hwprobe_vdso_data(),
>>>> an arch_initcall_sync. However, underlying data for some keys, like
>>>> RISCV_HWPROBE_KEY_MISALIGNED_VECTOR_PERF, is determined asynchronously.
>>>>
>>>> Specifically, the per_cpu(vector_misaligned_access, cpu) values are set
>>>> by the vec_check_unaligned_access_speed_all_cpus kthread. This kthread
>>>> is spawned by an earlier arch_initcall (check_unaligned_access_all_cpus)
>>>> and may complete its benchmark *after* init_hwprobe_vdso_data() has
>>>> already populated the vDSO with default/stale values.
>>>
>>> IIUC there's another race here: we don't ensure these complete before
>>> allowing userspace to see the values, so if these took so long to probe
>>> userspace started to make hwprobe() calls before they got scheduled we'd
>>> be providing the wrong answer.
>>>
>>> Unless I'm just missing something, though -- I thought we'd looked at that
>>> case?
>>>
>> Thanks for the review. You're right, my current patch doesn't fix the race
>> condition with userspace.
>
>
> I don't think there could be a race since all initcalls are executed sequentially, 
> meaning userspace won't be up before the arch_initcall level is finished. 
>
Yes, the initcall sequence itself provides a guarantee before userspace starts.  
The theoretical race condition I'm concerned about comes from kthread_run, 
which  hands off the probe's execution to the scheduler. This introduces an 
uncertainty  in timing that we shouldn't rely on. The completion mechanism is 
intended to  replace this timing assumption with an explicit, guaranteed 
synchronization point.
>
> But that means that this patch in its current form will probably slow down the 
> whole boot process. To avoid that (and in addition to this patch), can we move 
> init_hwprobe_vdso_data() to late_initcall()? 
>
That's a great point. To check the actual impact, I ran a test on a 64-core QEMU 
setup where I deliberately increased the probe time (from ~2ms to ~256ms by  
adjusting JIFFIES). Even in this worst-case scenario, the probe consistently 
finished  during the subsystem_initcalls. This suggests the real-world 
slowdown from  waiting is quite minimal.  

However, I completely agree with your suggestion. Combining completion with 
a later  initcall is the ideal solution to guarantee both correctness and 
performance. Since  the vDSO data is only consumed by userspace, moving its 
initialization to  late_initcall is perfectly safe and makes the most sense.
>
>>
>> The robust solution here is to use the kernel's `completion`. I've tested
>> this approach: the async probing thread calls `complete()` when finished,
>> and `init_hwprobe_vdso_data()` blocks on `wait_for_completion()`. This
>> guarantees the vDSO data is finalized before userspace can access it.
>>
>>>> So, refresh the vDSO data for specified keys (e.g.,
>>>> MISALIGNED_VECTOR_PERF) ensuring it reflects the final boot-time values.
>>>>
>>>> Test by comparing vDSO and syscall results for affected keys
>>>> (e.g., MISALIGNED_VECTOR_PERF), which now match their final
>>>> boot-time values.
>>>
>>> Wouldn't all the other keys we probe via workqueue be racy as well?
>>>
>> The completion mechanism is easily reusable. If this approach is accepted,
>> I will then identify other potential probe keys and integrate them into
>> this synchronization logic.
>
>
> Yes, I'd say that's the right way to do, there aren't lots of asynchronous 
> initialization stuff so we can handle that when new ones land. 
>
Thank you. I'll prepare a new version of the patch that implements this  
completion + late_initcall approach. I've also had a look, and for now, the  
asynchronous probe for vector misalignment seems to be a special case. The  
completion framework can be easily extended if similar ones land in the future. 

Thanks, 
Jingwei