[mm/contpte v3 1/1] mm/contpte: Optimize loop to reduce redundant operations

Xavier xavier_qy at 163.com
Wed Apr 16 09:15:37 PDT 2025




At 2025-04-16 16:57:06, "David Laight" <david.laight.linux at gmail.com> wrote:
>On Tue, 15 Apr 2025 16:22:05 +0800
>Xavier <xavier_qy at 163.com> wrote:
>
>> This commit optimizes the contpte_ptep_get function by adding early
>>  termination logic. It checks if the dirty and young bits of orig_pte
>>  are already set and skips redundant bit-setting operations during
>>  the loop. This reduces unnecessary iterations and improves performance.
>
>Benchmarks?
>
>As has been pointed out before CONT_PTES is small and IIRC dirty+young
>is unusual.

I haven't found some suitable benchmark tests yet. I will write some more
general test scenarios. Please pay attention to the subsequent emails.

>
>> 
>> Signed-off-by: Xavier <xavier_qy at 163.com>
>> ---
>>  arch/arm64/mm/contpte.c | 20 ++++++++++++++++++--
>>  1 file changed, 18 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
>> index bcac4f55f9c1..0acfee604947 100644
>> --- a/arch/arm64/mm/contpte.c
>> +++ b/arch/arm64/mm/contpte.c
>> @@ -152,6 +152,16 @@ void __contpte_try_unfold(struct mm_struct *mm, unsigned long addr,
>>  }
>>  EXPORT_SYMBOL_GPL(__contpte_try_unfold);
>>  
>> +/* Note: in order to improve efficiency, using this macro will modify the
>> + * passed-in parameters.*/
>
>... this macro modifies ...
>
>But you can make it obvious my passing by reference.
>The compiler will generate the same code.
>

This part may also be further refined.

--
Thanks,
Xavier


More information about the linux-arm-kernel mailing list