[PATCH] arm64: runtime-const: save one instruction when ARM64_VA_BITS <= 48

Jisheng Zhang jszhang at kernel.org
Mon Mar 2 08:08:09 PST 2026


On Fri, Feb 27, 2026 at 04:34:04PM +0000, Catalin Marinas wrote:
> On Wed, Feb 25, 2026 at 10:46:13PM +0800, Jisheng Zhang wrote:
> > Currently, the runtime_const_ptr() uses 4 instructions to move a long
> > imm to GP, but when ARM64_VA_BITS <= 48(which is true for android and
> > armbian), the top 8bits of runtime cont ptr is all '1', so we can make
>                     ^^^^^
>                     8 or 16?

16 ;)
> 
> > use of the movn instruction to construct the imm's top 8bits and lower
> > 16bits at the same time, thus save one instruction.
> 
> This works as long as KASAN_{SW,HW}_TAGS is disabled, otherwise the top
> byte of a pointer is not guaranteed to be 0xff. I think both
> filename_init() and dcache_init() can pass tagged pointers.

oops, you are right! I missed both: KASAN_SW_TAGS is disabled due to
overhead while KASAN_HW_TAGS doesn't work since I don't have the
platform. Will take care these two options in the future.
> 
> > diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
> > index be5915669d23..6797dd37d690 100644
> > --- a/arch/arm64/include/asm/runtime-const.h
> > +++ b/arch/arm64/include/asm/runtime-const.h
> > @@ -7,6 +7,8 @@
> >  /* Sigh. You can still run arm64 in BE mode */
> >  #include <asm/byteorder.h>
> >  
> > +#if CONFIG_ARM64_VA_BITS > 48
> 
> You could use VA_BITS, it's shorter, though if you add the KASAN checks
> it's a pretty long #if to copy all over the place. We could untag the
> pointer but it kind of defeats the purpose of enabling KASAN in the
> first place.

Usually, the runtime const ptr is set once during boot then read onlly
so IMHO we don't need KASAN to catch the ptr related memory bugs.

> 
> Given that Android enables KASAN_HW_TAGS by default, not sure we should
> bother with this change. Do you have any perf data to show that it's
> worth it?

Good question. I guess a micro benchmark just measure the 4 instructions
vs 3 instructions thus 25% saving can't persuade you to merge it. Let me
find or write a userspace program to iterate a deep directory to show
the improvement. Any hint is appreciated.

Thanks



More information about the linux-arm-kernel mailing list