[PATCH v2 1/2] mm: Allow lockless kernel pagetable walking

Lorenzo Stoakes lorenzo.stoakes at oracle.com
Tue Jun 10 06:35:00 PDT 2025


On Tue, Jun 10, 2025 at 03:31:56PM +0200, David Hildenbrand wrote:
> On 10.06.25 15:27, Lorenzo Stoakes wrote:
> > On Tue, Jun 10, 2025 at 03:24:16PM +0200, David Hildenbrand wrote:
> > > On 10.06.25 14:07, Lorenzo Stoakes wrote:
> > > > OK so I think the best solution here is to just update check_ops_valid(), which
> > > > was kind of sucky anyway (we check everywhere but walk_page_range_mm() to
> > > > enforce the install pte thing).
> > > >
> > > > Let's do something like:
> > > >
> > > > #define OPS_MAY_INSTALL_PTE	(1<<0)
> > > > #define OPS_MAY_AVOID_LOCK	(1<<1)
> > > >
> > > > and update check_ops_valid() to take a flags or maybe 'capabilities' field.
> > > >
> > > > Then check based on this e.g.:
> > > >
> > > > if (ops->install_pte && !(capabilities & OPS_MAY_INSTALL_PTE))
> > > > 	return false;
> > > >
> > > > if (ops->walk_lock == PGWALK_NOLOCK && !(capabilities & OPS_MAY_AVOID_LOCK))
> > > > 	return false;
> > > >
> > >
> > > Hm. I mean, we really only want to allow this lockless check for
> > > walk_kernel_page_table_range(), right?
> > >
> > > Having a walk_kernel_page_table_range_lockeless() might (or might not) be
> > > better, to really only special-case this specific path.
> >
> > Agree completely, Dev - let's definitely do this.
> >
> > >
> > > So, I am wondering if we should further start splitting the
> > > kernel-page-table walker up from the mm walker, at least on the "entry"
> > > function for now.
> >
> > How do you mean?
>
> In particular, "struct mm_walk_ops"
>
> does not quite make sense when not actually walking a "real" mm .
>
> So maybe we should start having a separate structure where *vma,
> install_pte, walk_lock, hugetlb* does not even exist.
>
> It might be a bit of churn, though ... not sure if there could be an easy
> translation layer for now.

But you know... I looove churn right? <3 <3 <3 :)))

That's a nice idea, but I think something that should be a follow up.

Quite honestly I hate a lot about this code. I did some refactoring before, and
I might do some again.

todo++; ;)

I can tie this together actually with Muchun's suggestions from
https://lore.kernel.org/all/1AA4A4B3-AEBE-484A-8EE2-35A15035E748@linux.dev/ in
my 'page walk improvement' todo sub-list...

>
> --
> Cheers,
>
> David / dhildenb
>



More information about the linux-arm-kernel mailing list