[PATCH v4 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory
Dev Jain
dev.jain at arm.com
Wed Jun 24 23:37:11 PDT 2026
On 18/06/26 2:17 pm, Wen Jiang wrote:
> This patchset accelerates ioremap, vmalloc, and vmap when the memory
> is physically fully or partially contiguous. Two techniques are used:
>
> 1. Avoid page table rewalk when setting PTEs/PMDs for multiple memory
> segments
> 2. Use batched mappings wherever possible in both vmalloc and ARM64
> layers
>
> Besides accelerating the mapping path, this also enables large
> mappings (PMD and cont-PTE) for vmap, which are currently not
> supported.
>
> Patches 1-2 extend ARM64 vmalloc CONT-PTE mapping to support multiple
> CONT-PTE regions instead of just one.
>
> Patch 3 extracts a common helper vmap_set_ptes() that consolidates PTE
> mapping logic between the ioremap and vmalloc/vmap paths, handling both
> CONT_PTE and regular PTE mappings. This prepares for the next patch.
>
> Patch 4 extends the page table walk path to support page shifts other
> than PAGE_SHIFT and eliminates the page table rewalk for huge vmalloc
> mappings. The function is renamed from vmap_small_pages_range_noflush()
> to vmap_pages_range_noflush_walk().
>
> Patches 5-6 add huge vmap support for contiguous pages, including
> support for non-compound pages with pfn alignment verification.
>
> On the RK3588 8-core ARM64 SoC, with tasks pinned to a little core and
> the performance CPUfreq policy enabled, benchmark results:
>
> * ioremap(1 MB): 1.35x faster (3407 ns -> 2526 ns)
> * vmalloc(1 MB) mapping time (excluding allocation) with
> VM_ALLOW_HUGE_VMAP: 1.42x faster (5.00 us -> 3.53us)
> * vmap(100MB) with order-8 pages: 8.3x faster (1235 us -> 149 us)
>
> Many thanks to Xueyuan Chen for his testing efforts on RK3588 boards.
>
I am still a little nervous about doing vmap-huge by default.
We can play set_memory_* games on a vmap huge mapping partially, thus
forcing a pgtable split, and not all arches can handle a kernel pgtable
split.
For arm64, we can handle that with BBML2_NOABORT, but interestingly, in
change_memory_common, arch/arm64/mm/pageattr.c:
area = find_vm_area((void *)addr);
if (!area ||
((unsigned long)kasan_reset_tag((void *)end) >
(unsigned long)kasan_reset_tag(area->addr) + area->size) ||
((area->flags & (VM_ALLOC | VM_ALLOW_HUGE_VMAP)) != VM_ALLOC))
return -EINVAL;
Even before my change fcf8dda8cc48, we were bailing out on
!(area->flags & VM_ALLOC))
So on arm64 we haven't been supporting set_memory_* for vmap memory at all, because
it has VM_MAP set and not VM_ALLOC. Although we have a contradictory comment above
this code so not sure if this was intentional:
"Let's restrict ourselves to mappings created by vmalloc (or vmap)."
So either there is no user in the kernel doing vmap + set_memory_* (looks like it
by doing an LLM scan), or it is not fatal for set_memory_* to fail.
But even if no one does it now, technically the API allows it.
>
More information about the linux-arm-kernel
mailing list