vmalloc faulting on RISC-V

Wed Sep 9 16:35:11 EDT 2020

On Tue, 08 Sep 2020 23:29:10 PDT (-0700), penberg at kernel.org wrote:
> Hello!
>
> Why does RISC-V need vmalloc faulting in do_page_fault()? If I
> understand correctly, some architectures implement it because process
> page tables can get out of sync with "init_mm.pgd". How does that
> happen on RISC-V?

RISC-V requires a sfence.vma when upgrading a mapping from invalid to valid, so
we need to do something.  Our two options are to eagerly sfence.vma (IIRC
there's a comment about how one might do so) or to handle the faults that arise 

On Rocket these faults are unlikely to manifest on the vmalloc region because
we don't speculatively fill the DTLB, memory is in order, and we don't
regularly reference invalid vmalloc pages.  The DTLB does cache invalid
mappings (IIRC it helps with critical path because it decouples the stall logic
from the address calculation logic, both of which are very tight in the M stage
of a canonical 5-stage pipeline) so they can and do show up, just not that
often.

Since the faults are rare and sfence.vma is expensive we decided that it would
be a net win to elide the fences and handle the resulting faults.  IIRC we
don't have any benchmarks to back that up, but intuitively it still smells like
a reasonable decision.

> I am asking because Joerg Roedel recently switched the x86
> architecture to a different approach because apparently vmalloc
> faulting is error-prone:
>
> commit 7f0a002b5a21302d9f4b29ba83c96cd433ff3769
> Author: Joerg Roedel <jroedel at suse.de>
> Date:   Mon Jun 1 21:52:40 2020 -0700
>
>     x86/mm: remove vmalloc faulting
>
>     Remove fault handling on vmalloc areas, as the vmalloc code now takes
>     care of synchronizing changes to all page-tables in the system.
>
> If RISC-V has the issue of page tables getting out of sync, I think we
> should switch to this approach too.

If it's actually error prone then we'll need to switch over, but I'd anticipate
we pay a performance hit on existing hardware so I'd prefer to fix the bugs if
possible.  My guess would be that the bugs are very ISA-specific and the
performance tradeoffs are very implementation-specific, so while the x86 folks
are usually quite solid on these things that may not apply to our use cases.

If there really is some reason shared between RISC-V and x86 that makes this
approach infeasible then we'll have to fix it.  Knowing why x86 changed their
approach would be the first step.