[PATCH v4 0/4] Optimize mprotect() for large folios

Mon Jun 30 04:17:19 PDT 2025

On Sat, Jun 28, 2025 at 05:04:31PM +0530, Dev Jain wrote:
> This patchset optimizes the mprotect() system call for large folios
> by PTE-batching. No issues were observed with mm-selftests, build
> tested on x86_64.

Should also be tested on x86-64 not only build tested :)

You are still not really giving details here, so same comment as your mremap()
series, please explain why you're doing this, what for, what benefits you expect
to achieve, where etc.

E.g. 'this is deisgned to optimise mTHP cases on arm64, we expect to see
benefits on amd64 also and for intel there should be no impact'.

It's probably also worth actually going and checking to make sure that this is
the case re: other arches. See below on that...

>
> We use the following test cases to measure performance, mprotect()'ing
> the mapped memory to read-only then read-write 40 times:
>
> Test case 1: Mapping 1G of memory, touching it to get PMD-THPs, then
> pte-mapping those THPs
> Test case 2: Mapping 1G of memory with 64K mTHPs
> Test case 3: Mapping 1G of memory with 4K pages
>
> Average execution time on arm64, Apple M3:
> Before the patchset:
> T1: 7.9 seconds   T2: 7.9 seconds   T3: 4.2 seconds
>
> After the patchset:
> T1: 2.1 seconds   T2: 2.2 seconds   T3: 4.3 seconds
>
> Observing T1/T2 and T3 before the patchset, we also remove the regression
> introduced by ptep_get() on a contpte block. And, for large folios we get
> an almost 74% performance improvement, albeit the trade-off being a slight
> degradation in the small folio case.

This is nice, though order-0 is probably going to be your bread and butter no?

Having said that, mprotect() is not a hot path, this delta is small enough to
quite possibly just be noise, and personally I'm not all that bothered.

But let's run this same test on x86-64 too please and get some before/after
numbers just to confirm no major impact.

Thanks for including code.

>
> Here is the test program:
>
>  #define _GNU_SOURCE
>  #include <sys/mman.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <stdio.h>
>  #include <unistd.h>
>
>  #define SIZE (1024*1024*1024)
>
> unsigned long pmdsize = (1UL << 21);
> unsigned long pagesize = (1UL << 12);
>
> static void pte_map_thps(char *mem, size_t size)
> {
> 	size_t offs;
> 	int ret = 0;
>
>
> 	/* PTE-map each THP by temporarily splitting the VMAs. */
> 	for (offs = 0; offs < size; offs += pmdsize) {
> 		ret |= madvise(mem + offs, pagesize, MADV_DONTFORK);
> 		ret |= madvise(mem + offs, pagesize, MADV_DOFORK);
> 	}
>
> 	if (ret) {
> 		fprintf(stderr, "ERROR: mprotect() failed\n");
> 		exit(1);
> 	}
> }
>
> int main(int argc, char *argv[])
> {
> 	char *p;
>         int ret = 0;
> 	p = mmap((1UL << 30), SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> 	if (p != (1UL << 30)) {
> 		perror("mmap");
> 		return 1;
> 	}
>
>
>
> 	memset(p, 0, SIZE);
> 	if (madvise(p, SIZE, MADV_NOHUGEPAGE))
> 		perror("madvise");
> 	explicit_bzero(p, SIZE);
> 	pte_map_thps(p, SIZE);
>
> 	for (int loops = 0; loops < 40; loops++) {
> 		if (mprotect(p, SIZE, PROT_READ))
> 			perror("mprotect"), exit(1);
> 		if (mprotect(p, SIZE, PROT_READ|PROT_WRITE))
> 			perror("mprotect"), exit(1);
> 		explicit_bzero(p, SIZE);
> 	}
> }
>
> ---
> The patchset is rebased onto Saturday's mm-new.
>
> v3->v4:
>  - Refactor skipping logic into a new function, edit patch 1 subject
>    to highlight it is only for MM_CP_PROT_NUMA case (David H)
>  - Refactor the optimization logic, add more documentation to the generic
>    batched functions, do not add clear_flush_ptes, squash patch 4
>    and 5 (Ryan)
>
> v2->v3:
>  - Add comments for the new APIs (Ryan, Lorenzo)
>  - Instead of refactoring, use a "skip_batch" label
>  - Move arm64 patches at the end (Ryan)
>  - In can_change_pte_writable(), check AnonExclusive page-by-page (David H)
>  - Resolve implicit declaration; tested build on x86 (Lance Yang)
>
> v1->v2:
>  - Rebase onto mm-unstable (6ebffe676fcf: util_macros.h: make the header more resilient)
>  - Abridge the anon-exclusive condition (Lance Yang)
>
> Dev Jain (4):
>   mm: Optimize mprotect() for MM_CP_PROT_NUMA by batch-skipping PTEs
>   mm: Add batched versions of ptep_modify_prot_start/commit
>   mm: Optimize mprotect() by PTE-batching
>   arm64: Add batched versions of ptep_modify_prot_start/commit
>
>  arch/arm64/include/asm/pgtable.h |  10 ++
>  arch/arm64/mm/mmu.c              |  28 +++-
>  include/linux/pgtable.h          |  83 +++++++++-
>  mm/mprotect.c                    | 269 +++++++++++++++++++++++--------
>  4 files changed, 315 insertions(+), 75 deletions(-)
>
> --
> 2.30.2
>