[PATCH] riscv: mm: Implement pmdp_collapse_flush for THP

mchitale at ventanamicro.com mchitale at ventanamicro.com
Sun Jan 29 23:34:50 PST 2023


On Fri, 2023-01-27 at 09:41 +0100, Alexandre Ghiti wrote:
> On 1/26/23 19:14, Anup Patel wrote:
> > On Thu, Jan 26, 2023 at 9:03 PM Alexandre Ghiti <alex at ghiti.fr>
> > wrote:
> > > Hi Mayuresh,
> > > 
> > > On 1/25/23 13:55, Mayuresh Chitale wrote:
> > > > When THP is enabled, 4K pages are collapsed into a single huge
> > > > page using the generic pmdp_collapse_flush() which will further
> > > > use flush_tlb_range() to shoot-down stale TLB entries.
> > > > Unfortunately,
> > > > the generic pmdp_collapse_flush() only invalidates cached leaf
> > > > PTEs
> > > > using address specific SFENCEs which results in repetitive (or
> > > > unpredictable) page faults on RISC-V implementations which
> > > > cache
> > > > non-leaf PTEs.
> > > 
> > > That's interesting! I'm wondering if the same issue will happen
> > > if a
> > > user maps 4K, unmaps it and at the same address maps a 2MB
> > > hugepage: I'm
> > > not sure the mm code would correctly flush the non-leaf PTE when
> > > unmapping the 4KB page. In that case, your patch only fixes the
> > > THP
> > > usecase and maybe we should try to catch this non-leaf -> leaf
> > > upgrade
> > > at some lower level page table functions, what do you think?
> > This issue can happen whenever existing/valid non-leaf PTE is
> > modified.
> > 
> > We hit this issue in the THP case but we can also hit this issue in
> > other
> > scenarios where the page table programming pattern is similar.
> 
> Then what about trying to get all those cases at once? We can easily 
> catch those spurious page faults as the pte would be valid: we would 
> just have to flush_tlb_mm at the first page fault, if any.

IMO, we can consider fixing those cases in a separate patch.
> 
> 
> > Regards,
> > Anup
> > 
> > > Alex
> > > 
> > > 
> > > > Provide a RISC-V specific pmdp_collapse_flush() which ensures
> > > > both
> > > > cached leaf and non-leaf PTEs are invalidated by using non-
> > > > address
> > > > specific SFENCEs as recommended by the RISC-V privileged
> > > > specification.
> > > > 
> > > > Fixes: e88b333142e4 ("riscv: mm: add THP support on 64-bit")
> > > > Signed-off-by: Mayuresh Chitale <mchitale at ventanamicro.com>
> > > > ---
> > > >    arch/riscv/include/asm/pgtable.h | 24
> > > > ++++++++++++++++++++++++
> > > >    1 file changed, 24 insertions(+)
> > > > 
> > > > diff --git a/arch/riscv/include/asm/pgtable.h
> > > > b/arch/riscv/include/asm/pgtable.h
> > > > index 4eba9a98d0e3..6d948dec6020 100644
> > > > --- a/arch/riscv/include/asm/pgtable.h
> > > > +++ b/arch/riscv/include/asm/pgtable.h
> > > > @@ -721,6 +721,30 @@ static inline pmd_t pmdp_establish(struct
> > > > vm_area_struct *vma,
> > > >        page_table_check_pmd_set(vma->vm_mm, address, pmdp,
> > > > pmd);
> > > >        return __pmd(atomic_long_xchg((atomic_long_t *)pmdp,
> > > > pmd_val(pmd)));
> > > >    }
> > > > +
> > > > +#define pmdp_collapse_flush pmdp_collapse_flush
> > > > +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct
> > > > *vma,
> > > > +                                     unsigned long address,
> > > > pmd_t *pmdp)
> > > > +{
> > > > +     pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, address,
> > > > pmdp);
> > > > +
> > > > +     /*
> > > > +      * When leaf PTE enteries (regular pages) are collapsed
> > > > into a leaf
> > > > +      * PMD entry (huge page), a valid non-leaf PTE is
> > > > converted into a
> > > > +      * valid leaf PTE at the level 1 page table. The RISC-V
> > > > privileged v1.12
> > > > +      * specification allows implementations to cache valid
> > > > non-leaf PTEs,
> > > > +      * but the section "4.2.1 Supervisor Memory-Management
> > > > Fence
> > > > +      * Instruction" recommends the following:
> > > > +      * "If software modifies a non-leaf PTE, it should
> > > > execute SFENCE.VMA
> > > > +      * with rs1=x0. If any PTE along the traversal path had
> > > > its G bit set,
> > > > +      * rs2 must be x0; otherwise, rs2 should be set to the
> > > > ASID for which
> > > > +      * the translation is being modified."
> > > > +      * Based on the above recommendation, we should do full
> > > > flush whenever
> > > > +      * leaf PTE entries are collapsed into a leaf PMD entry.
> > > > +      */
> > > > +     flush_tlb_mm(vma->vm_mm);
> > > > +     return pmd;
> > > > +}
> > > >    #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > > > 
> > > >    /*
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv




More information about the linux-riscv mailing list