[PATCH v2] arm64: optimize flush tlb kernel range
Anshuman Khandual
anshuman.khandual at arm.com
Sun Sep 22 21:53:51 PDT 2024
On 9/20/24 16:47, Kefeng Wang wrote:
>
>
> On 2024/9/20 14:10, Anshuman Khandual wrote:
>>
>>
>> On 9/20/24 09:25, Kefeng Wang wrote:
>>> Currently the kernel TLBs is flushed page by page if the target
>>> VA range is less than MAX_DVM_OPS * PAGE_SIZE, otherwise we'll
>>> brutally issue a TLBI ALL.
>>>
>>> But we could optimize it when CPU supports TLB range operations,
>>> convert to use __flush_tlb_range_op() like other tlb range flush
>>> to improve performance.
>>>
>>> Co-developed-by: Yicong Yang <yangyicong at hisilicon.com>
>>> Signed-off-by: Yicong Yang <yangyicong at hisilicon.com>
>>> Signed-off-by: Kefeng Wang <wangkefeng.wang at huawei.com>
>>> ---
>>> v2:
>>> - address Catalin's comments and use __flush_tlb_range_op() directly
>>>
>>> arch/arm64/include/asm/tlbflush.h | 24 +++++++++++++++++-------
>>> 1 file changed, 17 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>>> index 95fbc8c05607..42f0ec14fb2c 100644
>>> --- a/arch/arm64/include/asm/tlbflush.h
>>> +++ b/arch/arm64/include/asm/tlbflush.h
>>> @@ -492,19 +492,29 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
>>> static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
>>> {
>>> - unsigned long addr;
>>> + const unsigned long stride = PAGE_SIZE;
>>> + unsigned long pages;
>>> +
>>> + start = round_down(start, stride);
>>> + end = round_up(end, stride);
>>> + pages = (end - start) >> PAGE_SHIFT;
>>> - if ((end - start) > (MAX_DVM_OPS * PAGE_SIZE)) {
>>> + /*
>>> + * When not uses TLB range ops, we can handle up to
>>> + * (MAX_DVM_OPS - 1) pages;
>>> + * When uses TLB range ops, we can handle up to
>>> + * MAX_TLBI_RANGE_PAGES pages.
>>> + */
>>> + if ((!system_supports_tlb_range() &&
>>> + (end - start) >= (MAX_DVM_OPS * stride)) ||
>>> + pages > MAX_TLBI_RANGE_PAGES) {
>>> flush_tlb_all();
>>> return;
>>> }
>>
>> Could the above conditional check for flush_tlb_all() be factored out
>> in a helper, which can also be used in __flush_tlb_range_nosync() ?
>
> How about adding this helper, not good at naming,
>
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index 42f0ec14fb2c..b7043ff0945f 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -431,6 +431,23 @@ do { \
> #define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \
> __flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false, kvm_lpa2_is_enabled());
>
> +static inline int __flush_tlb_range_limit_excess(unsigned long start,
> + unsigned long end, unsigned long pages, unsigned long stride)
Helper name sounds just fine.
> +{
> + /*
> + * When not uses TLB range ops, we can handle up to
> + * (MAX_DVM_OPS - 1) pages;
> + * When uses TLB range ops, we can handle up to
> + * MAX_TLBI_RANGE_PAGES pages.
> + */
This could be re-worded some what, something like this or
may be you could make it better.
/*
* When the system does not support TLB range based flush
* operation, (MAX_DVM_OPS - 1) pages can be handled. But
* with TLB range based operation, MAX_TLBI_RANGE_PAGES
* pages can be handled.
*/
> + if ((!system_supports_tlb_range() &&
> + (end - start) >= (MAX_DVM_OPS * stride)) ||
> + pages > MAX_TLBI_RANGE_PAGES)
> + return -ERANGE;
> +
> + return 0;
> +}
> +
> static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma,
> unsigned long start, unsigned long end,
> unsigned long stride, bool last_level,
> @@ -442,15 +459,7 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma,
> end = round_up(end, stride);
> pages = (end - start) >> PAGE_SHIFT;
>
> - /*
> - * When not uses TLB range ops, we can handle up to
> - * (MAX_DVM_OPS - 1) pages;
> - * When uses TLB range ops, we can handle up to
> - * MAX_TLBI_RANGE_PAGES pages.
> - */
> - if ((!system_supports_tlb_range() &&
> - (end - start) >= (MAX_DVM_OPS * stride)) ||
> - pages > MAX_TLBI_RANGE_PAGES) {
> + if (__flush_tlb_range_limit_excess(start, end, pages, stride)) {
> flush_tlb_mm(vma->vm_mm);
> return;
> }
>
But yes, this factored out helper should now be used in flush_tlb_kernel_range() as well.
More information about the linux-arm-kernel
mailing list