[PATCH V3] riscv: asid: Fixup stale TLB entry cause application crash

Palmer Dabbelt palmer at rivosinc.com
Thu Dec 8 15:30:08 PST 2022


On Fri, 18 Nov 2022 12:57:21 PST (-0800), geomatsi at gmail.com wrote:
> Hi Guo Ren,
>
>
>> After use_asid_allocator is enabled, the userspace application will
>> crash by stale TLB entries. Because only using cpumask_clear_cpu without
>> local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
>> Then set_mm_asid would cause the user space application to get a stale
>> value by stale TLB entry, but set_mm_noasid is okay.
>
> ... [snip]
>
>> +	/*
>> +	 * The mm_cpumask indicates which harts' TLBs contain the virtual
>> +	 * address mapping of the mm. Compared to noasid, using asid
>> +	 * can't guarantee that stale TLB entries are invalidated because
>> +	 * the asid mechanism wouldn't flush TLB for every switch_mm for
>> +	 * performance. So when using asid, keep all CPUs footmarks in
>> +	 * cpumask() until mm reset.
>> +	 */
>> +	cpumask_set_cpu(cpu, mm_cpumask(next));
>> +	if (static_branch_unlikely(&use_asid_allocator)) {
>> +		set_mm_asid(next, cpu);
>> +	} else {
>> +		cpumask_clear_cpu(cpu, mm_cpumask(prev));
>> +		set_mm_noasid(next);
>> +	}
>>  }
>
> I observe similar user-space crashes on my SMP systems with enabled ASID.
> My attempt to fix the issue was a bit different, see the following patch:
>
> https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.com/
>
> In brief, the idea was borrowed from flush_icache_mm handling:
> - keep track of CPUs not running the task
> - perform per-ASID TLB flush on such CPUs only if the task is switched there

That way looks better to me: leaking hartids in the ASID allocator might 
make the crashes go away, but it's just going to end up trending towards 
flushing everything and that doesn't seem like the right long-term 
solution.

So I've got that one on for-next, sorry I missed it before.

Thanks!

>
> Your patch also works fine in my tests fixing those crashes. I have a
> question though, regarding removed cpumask_clear_cpu. How CPUs no more
> running the task are removed from its mm_cpumask ? If they are not
> removed, then flush_tlb_mm/flush_tlb_page will broadcast unnecessary
> TLB flushes to those CPUs when ASID is enabled.
>
> Regards,
> Sergey



More information about the linux-riscv mailing list