[PATCH 1/1] riscv: Implement arch_sync_kernel_mappings() for "preventive" TLB flush

Alexandre Ghiti alex at ghiti.fr
Tue Aug 8 03:16:50 PDT 2023


Hi Dylan,

On 07/08/2023 10:23, Dylan Jhong wrote:
> Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB,
> it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains
> the correct kernel mapping.
>
> The patch implements TLB flushing in arch_sync_kernel_mappings(), ensuring that kernel
> page table mappings created via vmap/vmalloc() are updated before switching MM.
>
> Signed-off-by: Dylan Jhong <dylan at andestech.com>
> ---
>   arch/riscv/include/asm/page.h |  2 ++
>   arch/riscv/mm/tlbflush.c      | 12 ++++++++++++
>   2 files changed, 14 insertions(+)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index b55ba20903ec..6c86ab69687e 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -21,6 +21,8 @@
>   #define HPAGE_MASK              (~(HPAGE_SIZE - 1))
>   #define HUGETLB_PAGE_ORDER      (HPAGE_SHIFT - PAGE_SHIFT)
>   
> +#define ARCH_PAGE_TABLE_SYNC_MASK	PGTBL_PTE_MODIFIED
> +
>   /*
>    * PAGE_OFFSET -- the first address of the first page of memory.
>    * When not using MMU this corresponds to the first free page in
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 77be59aadc73..d63364948c85 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -149,3 +149,15 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
>   	__flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE);
>   }
>   #endif
> +
> +/*
> + * Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB,
> + * it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains
> + * the correct kernel mapping. arch_sync_kernel_mappings() will ensure that kernel
> + * page table mappings created via vmap/vmalloc() are updated before switching MM.
> + */
> +void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
> +{
> +	if (start < VMALLOC_END && end > VMALLOC_START)


This test is too restrictive, it should catch the range [MODULES_VADDR;  
MODULES_END[ too, sorry I did not notice that at first.


> +		flush_tlb_all();
> +}
> \ No newline at end of file


I have to admit that I *think* both your patch and mine are wrong: one 
of the problem that led to the removal of vmalloc_fault() is the 
possibility for tracing functions to actually allocate vmalloc regions 
in the vmalloc page fault path, which could give rise to nested 
exceptions (see 
https://lore.kernel.org/lkml/20200508144043.13893-1-joro@8bytes.org/).

Here, everytime we allocate a vmalloc region, we send an IPI. If a 
vmalloc allocation happens in this path (if it is traced for example), 
it will give rise to an IPI...and so on.

So I came to the conclusion that the only way to actually fix this issue 
is by resolving the vmalloc faults very early in the page fault path (by 
emitting a sfence.vma on uarch that cache invalid entries), before the 
kernel stack is even accessed. That's the best solution since it would 
completely remove all the preventive sfence.vma in 
flush_cache_vmap()/arch_sync_kernel_mappings(), we would rely on 
faulting which I assume should not happen a lot (?).

I'm implementing this solution, but I'm pretty sure it won't be ready 
for 6.5. In the meantime, we need either your patch or mine to fix your 
issue...




More information about the linux-riscv mailing list