[PATCH v5 2/3] arm64: mmu: avoid allocating pages while splitting the linear mapping
Yeoreum Yun
yeoreum.yun at arm.com
Tue Jan 20 01:29:45 PST 2026
Hi Ryan
> On 19/01/2026 21:24, Yeoreum Yun wrote:
> > Hi Will,
> >
> >> On Mon, Jan 05, 2026 at 08:23:27PM +0000, Yeoreum Yun wrote:
> >>> +static int __init linear_map_prealloc_split_pgtables(void)
> >>> +{
> >>> + int ret, i;
> >>> + unsigned long lstart = _PAGE_OFFSET(vabits_actual);
> >>> + unsigned long lend = PAGE_END;
> >>> + unsigned long kstart = (unsigned long)lm_alias(_stext);
> >>> + unsigned long kend = (unsigned long)lm_alias(__init_begin);
> >>> +
> >>> + const struct mm_walk_ops collect_to_split_ops = {
> >>> + .pud_entry = collect_to_split_pud_entry,
> >>> + .pmd_entry = collect_to_split_pmd_entry
> >>> + };
> >>
> >> Why do we need to rewalk the page-table here instead of collating the
> >> number of block mappings we put down when creating the linear map in
> >> the first place?
>
> That's a good point; perhaps we can reuse the counters that this series introduces?
>
> https://lore.kernel.org/all/20260107002944.2940963-1-yang@os.amperecomputing.com/
>
> >
> > First, linear alias of the [_text, __init_begin) is not a target for
> > the split and it also seems strange to me to add code inside alloc_init_XXX()
> > that both checks an address range and counts to get the number of block mappings.
> >
> > Second, for a future feature,
> > I hope to add some code to split "specfic" area to be spilt e.x)
> > to set a specific pkey for specific area.
>
> Could you give more detail on this? My working assumption is that either the
> system supports BBML2 or it doesn't. If it doesn't, we need to split the whole
> linear map. If it does, we already have logic to split parts of the linear map
> when needed.
This is not for a linear mapping case. but for a "kernel text area".
As a draft, I want to mark some of kernel code can executable
both kernel and eBPF program.
(I'm trying to make eBPF program non-executable kernel code directly
with POE feature).
For this "executable area" both of kernel and eBPF program
-- typical example is exception entry, It need to split that specific
range and mark them with special POE index.
>
> >
> > In this case, it's useful to rewalk the page-table with the specific
> > range to get the number of block mapping.
> >
> >>
> >>> + split_pgtables_idx = 0;
> >>> + split_pgtables_count = 0;
> >>> +
> >>> + ret = walk_kernel_page_table_range_lockless(lstart, kstart,
> >>> + &collect_to_split_ops,
> >>> + NULL, NULL);
> >>> + if (!ret)
> >>> + ret = walk_kernel_page_table_range_lockless(kend, lend,
> >>> + &collect_to_split_ops,
> >>> + NULL, NULL);
> >>> + if (ret || !split_pgtables_count)
> >>> + goto error;
> >>> +
> >>> + ret = -ENOMEM;
> >>> +
> >>> + split_pgtables = kvmalloc(split_pgtables_count * sizeof(struct ptdesc *),
> >>> + GFP_KERNEL | __GFP_ZERO);
> >>> + if (!split_pgtables)
> >>> + goto error;
> >>> +
> >>> + for (i = 0; i < split_pgtables_count; i++) {
> >>> + /* The page table will be filled during splitting, so zeroing it is unnecessary. */
> >>> + split_pgtables[i] = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_ZERO, 0);
> >>> + if (!split_pgtables[i])
> >>> + goto error;
> >>
> >> This looks potentially expensive on the boot path and only gets worse as
> >> the amount of memory grows. Maybe we should predicate this preallocation
> >> on preempt-rt?
> >
> > Agree. then I'll apply pre-allocation with PREEMPT_RT only.
>
> I guess I'm missing something obvious but I don't understand the problem here...
> We are only deferring the allocation of all these pgtables, so the cost is
> neutral surely? Had we correctly guessed that the system doesn't support BBML2
> earlier, we would have had to allocate all these pgtables earlier.
>
> Another way to look at it is that we are still allocating the same number of
> pgtables in the existing fallback path, it's just that we are doing it inside
> the stop_machine().
>
> My vote would be _not_ to have a separate path for PREEMPT_RT, which will end up
> with significantly less testing...
IIUC, Will's mention is additional memory allocation for
"split_pgtables" where saved "pre-allocate" page tables.
As the memory increase, definitely this size would increase the cost.
And this cost need not to burden for !PREEMPT_RT since
it can use memory allocation in stop_machine() with GFP_ATOMIC.
But I also agree in the aspect that if that cost not much of huge,
It's also convincing and additionally, as I mentioned in another thread,
It would be good not to give a hallucination GFP_ATOMIC is fine for
everywhere even in the PREEMPT_RT.
--
Sincerely,
Yeoreum Yun
More information about the linux-arm-kernel
mailing list