[PATCH] riscv: mm: Implement pmdp_collapse_flush for THP
Alexandre Ghiti
alex at ghiti.fr
Thu Jan 26 07:33:01 PST 2023
Hi Mayuresh,
On 1/25/23 13:55, Mayuresh Chitale wrote:
> When THP is enabled, 4K pages are collapsed into a single huge
> page using the generic pmdp_collapse_flush() which will further
> use flush_tlb_range() to shoot-down stale TLB entries. Unfortunately,
> the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
> using address specific SFENCEs which results in repetitive (or
> unpredictable) page faults on RISC-V implementations which cache
> non-leaf PTEs.
That's interesting! I'm wondering if the same issue will happen if a
user maps 4K, unmaps it and at the same address maps a 2MB hugepage: I'm
not sure the mm code would correctly flush the non-leaf PTE when
unmapping the 4KB page. In that case, your patch only fixes the THP
usecase and maybe we should try to catch this non-leaf -> leaf upgrade
at some lower level page table functions, what do you think?
Alex
> Provide a RISC-V specific pmdp_collapse_flush() which ensures both
> cached leaf and non-leaf PTEs are invalidated by using non-address
> specific SFENCEs as recommended by the RISC-V privileged specification.
>
> Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
> Signed-off-by: Mayuresh Chitale <mchitale at ventanamicro.com>
> ---
> arch/riscv/include/asm/pgtable.h | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 4eba9a98d0e3..6d948dec6020 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -721,6 +721,30 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
> return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
> }
> +
> +#define pmdp_collapse_flush pmdp_collapse_flush
> +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
> + unsigned long address, pmd_t *pmdp)
> +{
> + pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
> +
> + /*
> + * When leaf PTE enteries (regular pages) are collapsed into a leaf
> + * PMD entry (huge page), a valid non-leaf PTE is converted into a
> + * valid leaf PTE at the level 1 page table. The RISC-V privileged v1.12
> + * specification allows implementations to cache valid non-leaf PTEs,
> + * but the section "4.2.1 Supervisor Memory-Management Fence
> + * Instruction" recommends the following:
> + * "If software modifies a non-leaf PTE, it should execute SFENCE.VMA
> + * with rs1=x0. If any PTE along the traversal path had its G bit set,
> + * rs2 must be x0; otherwise, rs2 should be set to the ASID for which
> + * the translation is being modified."
> + * Based on the above recommendation, we should do full flush whenever
> + * leaf PTE entries are collapsed into a leaf PMD entry.
> + */
> + flush_tlb_mm(vma->vm_mm);
> + return pmd;
> +}
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> /*
More information about the linux-riscv
mailing list