[PATCH] arm64: Implement clear_pages()
Ankur Arora
ankur.a.arora at oracle.com
Wed Mar 4 00:05:04 PST 2026
Linus Walleij <linusw at kernel.org> writes:
> On Tue, Mar 3, 2026 at 3:46 PM Will Deacon <will at kernel.org> wrote:
>
>> > +extern void clear_pages_asm(void *addr, unsigned int nbytes);
>> > +
>> > +static inline void clear_pages(void *addr, unsigned int npages)
>> > +{
>> > + clear_pages_asm(addr, npages * PAGE_SIZE);
>> > +}
>> > +#define clear_pages clear_pages
>>
>> Hmm. From what I can tell, this just turns a branch in C code into a
>> branch in assembly, so it's hard to correlate that meaningfully with
>> the performance improvement you see.
>
> I think what I see is the effect of #define clear_pages clear_pages.
>
> Because without that <linux/mm.h> open codes:
>
> #ifndef clear_pages
> (...)
> static inline void clear_pages(void *addr, unsigned int npages)
> {
> do {
> clear_page(addr);
> addr += PAGE_SIZE;
> } while (--npages);
> }
> #endif
>
> So for clearing anything multi-page we get an outer loop
> and an inner loop inside clear_page(), but with clear_pages()
> implemented there is no outer loop, instead the total bytes is
> computed first (not one page at a time) and then there is a
> single loop.
So, on x86 (specifically on AMD Zen and Intel Icelake systems)
the extra computation, branches, and in an early version calls
cond_resched() after every single page did not seem to matter.
This is probably uarch dependant but seems to me that the cost
of an extra address computation or an easily predicted branch
would probably be just noise.
>> If we have CPUs that are this sensitive to branches, perhaps we'd be
>> better off taking the opposite approach and moving more code into C
>> so that the compiler can optimise the control flow for us?
>
> Hm! That would be to create a default clear_page() in
> <linux/mm.h> and simply delete the existing lib/clear_page.S
> and let the default kick in.
>
> Right now every arch is implementing it custom.
> Maybe for no reason in some cases, I could try it!
>
> I doubt the compiler would emit this part though:
>
> #ifdef CONFIG_AS_HAS_MOPS
> (...)
> alternative_else_nop_endif
> setpn [x0]!, x1!, xzr
> setmn [x0]!, x1!, xzr
> seten [x0]!, x1!, xzr
> ret
>
> Three instructions to clear all pages. But maybe that is not good
> if this is a gigabyte, and the per-page loop provides a good breather
> preemption point in that case, and then we just shouldn't touch
> anything.
The code in folio_zero_user (clear_contig_highpages()) takes care of
chunking up the clearing based on preemption model.
The idea being that if you are running with preempt=none or voluntary
then you might want to call cond_resched(), say every 32MB or so.
If you are running with preempt=full or preempt=lazy, then it would
just clear a full GB page.
That would need the set[mpe]n instructions to be interruptible though.
(Seems to me that that is true but maybe someone could confirm.)
Thanks
--
ankur
More information about the linux-arm-kernel
mailing list