[PATCH v2 0/7] Optimize mprotect for large folios
Dev Jain
dev.jain at arm.com
Mon Apr 28 22:23:29 PDT 2025
This patchset optimizes the mprotect() system call for large folios
by PTE-batching.
We use the following test cases to measure performance, mprotect()'ing
the mapped memory to read-only then read-write 40 times:
Test case 1: Mapping 1G of memory, touching it to get PMD-THPs, then
pte-mapping those THPs
Test case 2: Mapping 1G of memory with 64K mTHPs
Test case 3: Mapping 1G of memory with 4K pages
Average execution time on arm64, Apple M3:
Before the patchset:
T1: 7.9 seconds T2: 7.9 seconds T3: 4.2 seconds
After the patchset:
T1: 2.1 seconds T2: 2.2 seconds T3: 4.2 seconds
Observing T1/T2 and T3 before the patchset, we also remove the regression
introduced by ptep_get() on a contpte block. And, for large folios we get
an almost 74% performance improvement.
v1->v2:
- Rebase onto mm-unstable (6ebffe676fcf: util_macros.h: make the header more resilient)
- Abridge the anon-exclusive condition (Lance Yang)
Dev Jain (7):
mm: Refactor code in mprotect
mm: Optimize mprotect() by batch-skipping PTEs
mm: Add batched versions of ptep_modify_prot_start/commit
arm64: Add batched version of ptep_modify_prot_start
arm64: Add batched version of ptep_modify_prot_commit
mm: Batch around can_change_pte_writable()
mm: Optimize mprotect() through PTE-batching
arch/arm64/include/asm/pgtable.h | 10 ++
arch/arm64/mm/mmu.c | 21 +++-
include/linux/mm.h | 4 +-
include/linux/pgtable.h | 42 ++++++++
mm/gup.c | 2 +-
mm/huge_memory.c | 4 +-
mm/memory.c | 6 +-
mm/mprotect.c | 165 ++++++++++++++++++++-----------
mm/pgtable-generic.c | 16 ++-
9 files changed, 198 insertions(+), 72 deletions(-)
--
2.30.2
More information about the linux-arm-kernel
mailing list