[PATCH v2] riscv: mm: Implement pmdp_collapse_flush for THP

Anup Patel apatel at ventanamicro.com
Fri Feb 3 09:23:54 PST 2023


On Thu, Feb 2, 2023 at 10:05 AM Palmer Dabbelt <palmer at dabbelt.com> wrote:
>
> On Sun, 29 Jan 2023 23:48:15 PST (-0800), mchitale at ventanamicro.com wrote:
> > When THP is enabled, 4K pages are collapsed into a single huge
> > page using the generic pmdp_collapse_flush() which will further
> > use flush_tlb_range() to shoot-down stale TLB entries. Unfortunately,
> > the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
> > using address specific SFENCEs which results in repetitive (or
> > unpredictable) page faults on RISC-V implementations which cache
> > non-leaf PTEs.
> >
> > Provide a RISC-V specific pmdp_collapse_flush() which ensures both
> > cached leaf and non-leaf PTEs are invalidated by using non-address
> > specific SFENCEs as recommended by the RISC-V privileged specification.
> >
> > Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
> > Signed-off-by: Mayuresh Chitale <mchitale at ventanamicro.com>
> > ---
> >  arch/riscv/include/asm/pgtable.h |  4 ++++
> >  arch/riscv/mm/pgtable.c          | 26 ++++++++++++++++++++++++++
> >  2 files changed, 30 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index 4eba9a98d0e3..3e01f4f3ab08 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -721,6 +721,10 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> >       page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
> >       return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
> >  }
> > +
> > +#define pmdp_collapse_flush pmdp_collapse_flush
> > +extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
> > +                              unsigned long address, pmd_t *pmdp);
> >  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >
> >  /*
> > diff --git a/arch/riscv/mm/pgtable.c b/arch/riscv/mm/pgtable.c
> > index 6645ead1a7c1..5da1916c231e 100644
> > --- a/arch/riscv/mm/pgtable.c
> > +++ b/arch/riscv/mm/pgtable.c
> > @@ -81,3 +81,29 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
> >  }
> >
> >  #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
> > +                                     unsigned long address, pmd_t *pmdp)
> > +{
> > +     pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
> > +
> > +     VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> > +     VM_BUG_ON(pmd_trans_huge(*pmdp));
> > +     /*
> > +      * When leaf PTE enteries (regular pages) are collapsed into a leaf
> > +      * PMD entry (huge page), a valid non-leaf PTE is converted into a
> > +      * valid leaf PTE at the level 1 page table. The RISC-V privileged v1.12
> > +      * specification allows implementations to cache valid non-leaf PTEs,
> > +      * but the section "4.2.1 Supervisor Memory-Management Fence
> > +      * Instruction" recommends the following:
> > +      * "If software modifies a non-leaf PTE, it should execute SFENCE.VMA
> > +      * with rs1=x0. If any PTE along the traversal path had its G bit set,
> > +      * rs2 must be x0; otherwise, rs2 should be set to the ASID for which
> > +      * the translation is being modified."
> > +      * Based on the above recommendation, we should do full flush whenever
> > +      * leaf PTE entries are collapsed into a leaf PMD entry.
>
> It's generally best to ignore the recommendations in the commentary
> about flushes, they assume a specific software model that doesn't always
> apply to Linux.  In this case we do need the fence, though, but I
> changed the comment to explain it differently (an fix at least one
> spelling mistake).
>
> I've squashed this in and have it in staging, if that looks good I'll
> put it on fixes.  Thanks!
>
> diff --git a/arch/riscv/mm/pgtable.c b/arch/riscv/mm/pgtable.c
> index 5da1916c231e..fef4e7328e49 100644
> --- a/arch/riscv/mm/pgtable.c
> +++ b/arch/riscv/mm/pgtable.c
> @@ -90,18 +90,12 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
>         VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>         VM_BUG_ON(pmd_trans_huge(*pmdp));
>         /*
> -        * When leaf PTE enteries (regular pages) are collapsed into a leaf
> +        * When leaf PTE entries (regular pages) are collapsed into a leaf
>          * PMD entry (huge page), a valid non-leaf PTE is converted into a
> -        * valid leaf PTE at the level 1 page table. The RISC-V privileged v1.12
> -        * specification allows implementations to cache valid non-leaf PTEs,
> -        * but the section "4.2.1 Supervisor Memory-Management Fence
> -        * Instruction" recommends the following:
> -        * "If software modifies a non-leaf PTE, it should execute SFENCE.VMA
> -        * with rs1=x0. If any PTE along the traversal path had its G bit set,
> -        * rs2 must be x0; otherwise, rs2 should be set to the ASID for which
> -        * the translation is being modified."
> -        * Based on the above recommendation, we should do full flush whenever
> -        * leaf PTE entries are collapsed into a leaf PMD entry.
> +        * valid leaf PTE at the level 1 page table.  Since the sfence.vma
> +        * forms that specify an address only apply to leaf PTEs, we need a
> +        * global flush here.  collapse_huge_page() assumes these flushes are
> +        * eager, so just do the fence here.
>          */
>         flush_tlb_mm(vma->vm_mm);
>         return pmd;

This looks good to me. Please add it to the list of fixes.

Regards,
Anup

>
>
> > +      */
> > +     flush_tlb_mm(vma->vm_mm);
> > +     return pmd;
> > +}
> > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */



More information about the linux-riscv mailing list