[PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

Wed Jul 22 05:43:50 EDT 2020

On Tue, Jul 21, 2020 at 9:06 PM Palmer Dabbelt <palmer at dabbelt.com> wrote:
>
> On Tue, 21 Jul 2020 11:36:10 PDT (-0700), alex at ghiti.fr wrote:
> > Let's try to make progress here: I add linux-mm in CC to get feedback on
> > this patch as it blocks sv48 support too.
>
> Sorry for being slow here.  I haven't replied because I hadn't really fleshed
> out the design yet, but just so everyone's on the same page my problems with
> this are:
>
> * We waste vmalloc space on 32-bit systems, where there isn't a lot of it.

There is actually an ongoing work to make 32-bit Arm kernels move
vmlinux into the vmalloc space, as part of the move to avoid highmem.

Overall, a 32-bit system would waste about 0.1% of its virtual address space
by having the kernel be located in both the linear map and the vmalloc area.
It's not zero, but not that bad either. With the typical split of 3072 MB user,
768MB linear and 256MB vmalloc, it's also around 1.5% of the available
vmalloc area (assuming a 4MB vmlinux in a typical 32-bit kernel), but the
boundaries can be changed arbitrarily if needed.

The eventual goal is to have a split of 3840MB for either user or linear map
plus and 256MB for vmalloc, including the kernel. Switching between linear
and user has a noticeable runtime overhead, but it relaxes both the limits
for user memory and lowmem, and it provides a somewhat stronger
address space isolation.

Another potential idea would be to completely randomize the physical
addresses underneath the kernel by using a random permutation of the
pages in the kernel image. This adds even more overhead (virt_to_phys
may need to call vmalloc_to_page or similar) and may cause problems
with DMA into kernel .data across page boundaries,

> * Sort out how to maintain a linear map as the canonical hole moves around
>   between the VA widths without adding a bunch of overhead to the virt2phys and
>   friends.  This is probably going to be the trickiest part, but I think if we
>   just change the page table code to essentially lie about VAs when an sv39
>   system runs an sv48+sv39 kernel we could make it work -- there'd be some
>   logical complexity involved, but it would remain fast.

I assume you can't use the trick that x86 has where all kernel addresses
are at the top of the 64-bit address space and user addresses are at the
bottom, regardless of the size of the page tables?

      Arnd