[PATCH 1/1] riscv: Implement arch_sync_kernel_mappings() for "preventive" TLB flush
Alexandre Ghiti
alex at ghiti.fr
Tue Aug 8 03:16:50 PDT 2023
Hi Dylan,
On 07/08/2023 10:23, Dylan Jhong wrote:
> Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB,
> it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains
> the correct kernel mapping.
>
> The patch implements TLB flushing in arch_sync_kernel_mappings(), ensuring that kernel
> page table mappings created via vmap/vmalloc() are updated before switching MM.
>
> Signed-off-by: Dylan Jhong <dylan at andestech.com>
> ---
> arch/riscv/include/asm/page.h | 2 ++
> arch/riscv/mm/tlbflush.c | 12 ++++++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index b55ba20903ec..6c86ab69687e 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -21,6 +21,8 @@
> #define HPAGE_MASK (~(HPAGE_SIZE - 1))
> #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
>
> +#define ARCH_PAGE_TABLE_SYNC_MASK PGTBL_PTE_MODIFIED
> +
> /*
> * PAGE_OFFSET -- the first address of the first page of memory.
> * When not using MMU this corresponds to the first free page in
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 77be59aadc73..d63364948c85 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -149,3 +149,15 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> __flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE);
> }
> #endif
> +
> +/*
> + * Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB,
> + * it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains
> + * the correct kernel mapping. arch_sync_kernel_mappings() will ensure that kernel
> + * page table mappings created via vmap/vmalloc() are updated before switching MM.
> + */
> +void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
> +{
> + if (start < VMALLOC_END && end > VMALLOC_START)
This test is too restrictive, it should catch the range [MODULES_VADDR;
MODULES_END[ too, sorry I did not notice that at first.
> + flush_tlb_all();
> +}
> \ No newline at end of file
I have to admit that I *think* both your patch and mine are wrong: one
of the problem that led to the removal of vmalloc_fault() is the
possibility for tracing functions to actually allocate vmalloc regions
in the vmalloc page fault path, which could give rise to nested
exceptions (see
https://lore.kernel.org/lkml/20200508144043.13893-1-joro@8bytes.org/).
Here, everytime we allocate a vmalloc region, we send an IPI. If a
vmalloc allocation happens in this path (if it is traced for example),
it will give rise to an IPI...and so on.
So I came to the conclusion that the only way to actually fix this issue
is by resolving the vmalloc faults very early in the page fault path (by
emitting a sfence.vma on uarch that cache invalid entries), before the
kernel stack is even accessed. That's the best solution since it would
completely remove all the preventive sfence.vma in
flush_cache_vmap()/arch_sync_kernel_mappings(), we would rely on
faulting which I assume should not happen a lot (?).
I'm implementing this solution, but I'm pretty sure it won't be ready
for 6.5. In the meantime, we need either your patch or mine to fix your
issue...
More information about the linux-riscv
mailing list