[PATCH] Insert SFENCE.VMA in function set_pte_at for RISCV

Andrew Waterman waterman at eecs.berkeley.edu
Tue Mar 16 08:29:35 GMT 2021


On Tue, Mar 16, 2021 at 12:32 AM Anup Patel <anup at brainfault.org> wrote:
>
> On Tue, Mar 16, 2021 at 12:27 PM Jiuyang Liu <liu at jiuyang.me> wrote:
> >
> > > As per my understanding, we don't need to explicitly invalidate local TLB
> > > in set_pte() or set_pet_at() because generic Linux page table management
> > > (<linux>/mm/*) will call the appropriate flush_tlb_xyz() function after page
> > > table updates.
> >
> > I witnessed this bug in our micro-architecture: set_pte instruction is
> > still in the store buffer, no functions are inserting SFENCE.VMA in
> > the stack below, so TLB cannot witness this modification.
> > Here is my call stack:
> > set_pte
> > set_pte_at
> > map_vm_area
> > __vmalloc_area_node
> > __vmalloc_node_range
> > __vmalloc_node
> > __vmalloc_node_flags
> > vzalloc
> > n_tty_open
> >
> > I think this is an architecture specific code, so <linux>/mm/* should
> > not be modified.
> > And spec requires SFENCE.VMA to be inserted on each modification to
> > TLB. So I added code here.
>
> The generic linux/mm/* already calls the appropriate tlb_flush_xyz()
> function defined in arch/riscv/include/asm/tlbflush.h
>
> Better to have a write-barrier in set_pte().
>
> >
> > > Also, just local TLB flush is generally not sufficient because
> > > a lot of page tables will be used across on multiple HARTs.
> >
> > Yes, this is the biggest issue, in RISC-V Volume 2, Privileged Spec v.
> > 20190608 page 67 gave a solution:
>
> This is not an issue with RISC-V privilege spec rather it is more about
> placing RISC-V fences at right locations.
>
> > Consequently, other harts must be notified separately when the
> > memory-management data structures have been modified. One approach is
> > to use
> > 1) a local data fence to ensure local writes are visible globally,
> > then 2) an interprocessor interrupt to the other thread,
> > then 3) a local SFENCE.VMA in the interrupt handler of the remote thread,
> > and finally 4) signal back to originating thread that operation is
> > complete. This is, of course, the RISC-V analog to a TLB shootdown.
>
> I would suggest trying approach#1.
>
> You can include "asm/barrier.h" here and use wmb() or __smp_wmb()
> in-place of local TLB flush.

wmb() doesn't suffice to order older stores before younger page-table
walks, so that might hide the problem without actually fixing it.

Based upon Jiuyang's description, it does sound plausible that we are
missing an SFENCE.VMA (or TLB shootdown) somewhere.  But I don't
understand the situation well enough to know where that might be, or
what the best fix is.


>
> >
> > In general, this patch didn't handle the G bit in PTE, kernel trap it
> > to sbi_remote_sfence_vma. do you think I should use flush_tlb_all?
> >
> > Jiuyang
> >
> >
> >
> >
> > arch/arm/mm/mmu.c
> > void set_pte_at(struct mm_struct *mm, unsigned long addr,
> >                               pte_t *ptep, pte_t pteval)
> > {
> >         unsigned long ext = 0;
> >
> >         if (addr < TASK_SIZE && pte_valid_user(pteval)) {
> >                 if (!pte_special(pteval))
> >                         __sync_icache_dcache(pteval);
> >                 ext |= PTE_EXT_NG;
> >         }
> >
> >         set_pte_ext(ptep, pteval, ext);
> > }
> >
> > arch/mips/include/asm/pgtable.h
> > static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> >                               pte_t *ptep, pte_t pteval)
> > {
> >
> >         if (!pte_present(pteval))
> >                 goto cache_sync_done;
> >
> >         if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> >                 goto cache_sync_done;
> >
> >         __update_cache(addr, pteval);
> > cache_sync_done:
> >         set_pte(ptep, pteval);
> > }
> >
> >
> > Also, just local TLB flush is generally not sufficient because
> > > a lot of page tables will be used accross on multiple HARTs.
> >
> >
> > On Tue, Mar 16, 2021 at 5:05 AM Anup Patel <anup at brainfault.org> wrote:
> > >
> > > +Alex
> > >
> > > On Tue, Mar 16, 2021 at 9:20 AM Jiuyang Liu <liu at jiuyang.me> wrote:
> > > >
> > > > This patch inserts SFENCE.VMA after modifying PTE based on RISC-V
> > > > specification.
> > > >
> > > > arch/riscv/include/asm/pgtable.h:
> > > > 1. implement pte_user, pte_global and pte_leaf to check correspond
> > > > attribute of a pte_t.
> > >
> > > Adding pte_user(), pte_global(), and pte_leaf() is fine.
> > >
> > > >
> > > > 2. insert SFENCE.VMA in set_pte_at based on RISC-V Volume 2, Privileged
> > > > Spec v. 20190608 page 66 and 67:
> > > > If software modifies a non-leaf PTE, it should execute SFENCE.VMA with
> > > > rs1=x0. If any PTE along the traversal path had its G bit set, rs2 must
> > > > be x0; otherwise, rs2 should be set to the ASID for which the
> > > > translation is being modified.
> > > > If software modifies a leaf PTE, it should execute SFENCE.VMA with rs1
> > > > set to a virtual address within the page. If any PTE along the traversal
> > > > path had its G bit set, rs2 must be x0; otherwise, rs2 should be set to
> > > > the ASID for which the translation is being modified.
> > > >
> > > > arch/riscv/include/asm/tlbflush.h:
> > > > 1. implement get_current_asid to get current program asid.
> > > > 2. implement local_flush_tlb_asid to flush tlb with asid.
> > >
> > > As per my understanding, we don't need to explicitly invalidate local TLB
> > > in set_pte() or set_pet_at() because generic Linux page table management
> > > (<linux>/mm/*) will call the appropriate flush_tlb_xyz() function after page
> > > table updates. Also, just local TLB flush is generally not sufficient because
> > > a lot of page tables will be used accross on multiple HARTs.
> > >
> > > >
> > > > Signed-off-by: Jiuyang Liu <liu at jiuyang.me>
> > > > ---
> > > >  arch/riscv/include/asm/pgtable.h  | 27 +++++++++++++++++++++++++++
> > > >  arch/riscv/include/asm/tlbflush.h | 12 ++++++++++++
> > > >  2 files changed, 39 insertions(+)
> > > >
> > > > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > > > index ebf817c1bdf4..5a47c60372c1 100644
> > > > --- a/arch/riscv/include/asm/pgtable.h
> > > > +++ b/arch/riscv/include/asm/pgtable.h
> > > > @@ -222,6 +222,16 @@ static inline int pte_write(pte_t pte)
> > > >         return pte_val(pte) & _PAGE_WRITE;
> > > >  }
> > > >
> > > > +static inline int pte_user(pte_t pte)
> > > > +{
> > > > +       return pte_val(pte) & _PAGE_USER;
> > > > +}
> > > > +
> > > > +static inline int pte_global(pte_t pte)
> > > > +{
> > > > +       return pte_val(pte) & _PAGE_GLOBAL;
> > > > +}
> > > > +
> > > >  static inline int pte_exec(pte_t pte)
> > > >  {
> > > >         return pte_val(pte) & _PAGE_EXEC;
> > > > @@ -248,6 +258,11 @@ static inline int pte_special(pte_t pte)
> > > >         return pte_val(pte) & _PAGE_SPECIAL;
> > > >  }
> > > >
> > > > +static inline int pte_leaf(pte_t pte)
> > > > +{
> > > > +       return pte_val(pte) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC);
> > > > +}
> > > > +
> > > >  /* static inline pte_t pte_rdprotect(pte_t pte) */
> > > >
> > > >  static inline pte_t pte_wrprotect(pte_t pte)
> > > > @@ -358,6 +373,18 @@ static inline void set_pte_at(struct mm_struct *mm,
> > > >                 flush_icache_pte(pteval);
> > > >
> > > >         set_pte(ptep, pteval);
> > > > +
> > > > +       if (pte_present(pteval)) {
> > > > +               if (pte_leaf(pteval)) {
> > > > +                       local_flush_tlb_page(addr);
> > > > +               } else {
> > > > +                       if (pte_global(pteval))
> > > > +                               local_flush_tlb_all();
> > > > +                       else
> > > > +                               local_flush_tlb_asid();
> > > > +
> > > > +               }
> > > > +       }
> > > >  }
> > > >
> > > >  static inline void pte_clear(struct mm_struct *mm,
> > > > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> > > > index 394cfbccdcd9..1f9b62b3670b 100644
> > > > --- a/arch/riscv/include/asm/tlbflush.h
> > > > +++ b/arch/riscv/include/asm/tlbflush.h
> > > > @@ -21,6 +21,18 @@ static inline void local_flush_tlb_page(unsigned long addr)
> > > >  {
> > > >         __asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory");
> > > >  }
> > > > +
> > > > +static inline unsigned long get_current_asid(void)
> > > > +{
> > > > +       return (csr_read(CSR_SATP) >> SATP_ASID_SHIFT) & SATP_ASID_MASK;
> > > > +}
> > > > +
> > > > +static inline void local_flush_tlb_asid(void)
> > > > +{
> > > > +       unsigned long asid = get_current_asid();
> > > > +       __asm__ __volatile__ ("sfence.vma x0, %0" : : "r" (asid) : "memory");
> > > > +}
> > > > +
> > > >  #else /* CONFIG_MMU */
> > > >  #define local_flush_tlb_all()                  do { } while (0)
> > > >  #define local_flush_tlb_page(addr)             do { } while (0)
> > > > --
> > > > 2.30.2
> > > >
> > > >
> > > > _______________________________________________
> > > > linux-riscv mailing list
> > > > linux-riscv at lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > >
> > > Regards,
> > > Anup
>
> Regards,
> Anup



More information about the linux-riscv mailing list