[PATCH bpf-next v2 0/4] Add ftrace direct call for arm64

Jiri Olsa olsajiri at gmail.com
Wed Oct 5 15:12:17 PDT 2022


On Wed, Oct 05, 2022 at 11:30:19AM -0400, Steven Rostedt wrote:
> On Wed, 5 Oct 2022 17:10:33 +0200
> Florent Revest <revest at chromium.org> wrote:
> 
> > On Wed, Oct 5, 2022 at 5:07 PM Steven Rostedt <rostedt at goodmis.org> wrote:
> > >
> > > On Wed, 5 Oct 2022 22:54:15 +0800
> > > Xu Kuohai <xukuohai at huawei.com> wrote:
> > >  
> > > > 1.3 attach bpf prog with with direct call, bpftrace -e 'kfunc:vfs_write {}'
> > > >
> > > > # dd if=/dev/zero of=/dev/null count=1000000
> > > > 1000000+0 records in
> > > > 1000000+0 records out
> > > > 512000000 bytes (512 MB, 488 MiB) copied, 1.72973 s, 296 MB/s
> > > >
> > > >
> > > > 1.4 attach bpf prog with with indirect call, bpftrace -e 'kfunc:vfs_write {}'
> > > >
> > > > # dd if=/dev/zero of=/dev/null count=1000000
> > > > 1000000+0 records in
> > > > 1000000+0 records out
> > > > 512000000 bytes (512 MB, 488 MiB) copied, 1.99179 s, 257 MB/s  
> > 
> > Thanks for the measurements Xu!
> > 
> > > Can you show the implementation of the indirect call you used?  
> > 
> > Xu used my development branch here
> > https://github.com/FlorentRevest/linux/commits/fprobe-min-args

nice :) I guess you did not try to run it on x86, I had to add some small
changes and disable HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS to compile it

> 
> That looks like it could be optimized quite a bit too.
> 
> Specifically this part:
> 
> static bool bpf_fprobe_entry(struct fprobe *fp, unsigned long ip, struct ftrace_regs *regs, void *private)
> {
> 	struct bpf_fprobe_call_context *call_ctx = private;
> 	struct bpf_fprobe_context *fprobe_ctx = fp->ops.private;
> 	struct bpf_tramp_links *links = fprobe_ctx->links;
> 	struct bpf_tramp_links *fentry = &links[BPF_TRAMP_FENTRY];
> 	struct bpf_tramp_links *fmod_ret = &links[BPF_TRAMP_MODIFY_RETURN];
> 	struct bpf_tramp_links *fexit = &links[BPF_TRAMP_FEXIT];
> 	int i, ret;
> 
> 	memset(&call_ctx->ctx, 0, sizeof(call_ctx->ctx));
> 	call_ctx->ip = ip;
> 	for (i = 0; i < fprobe_ctx->nr_args; i++)
> 		call_ctx->args[i] = ftrace_regs_get_argument(regs, i);
> 
> 	for (i = 0; i < fentry->nr_links; i++)
> 		call_bpf_prog(fentry->links[i], &call_ctx->ctx, call_ctx->args);
> 
> 	call_ctx->args[fprobe_ctx->nr_args] = 0;
> 	for (i = 0; i < fmod_ret->nr_links; i++) {
> 		ret = call_bpf_prog(fmod_ret->links[i], &call_ctx->ctx,
> 				      call_ctx->args);
> 
> 		if (ret) {
> 			ftrace_regs_set_return_value(regs, ret);
> 			ftrace_override_function_with_return(regs);
> 
> 			bpf_fprobe_exit(fp, ip, regs, private);
> 			return false;
> 		}
> 	}
> 
> 	return fexit->nr_links;
> }
> 
> There's a lot of low hanging fruit to speed up there. I wouldn't be too
> fast to throw out this solution if it hasn't had the care that direct calls
> have had to speed that up.
> 
> For example, trampolines currently only allow to attach to functions with 6
> parameters or less (3 on x86_32). You could make 7 specific callbacks, with
> zero to 6 parameters, and unroll the argument loop.
> 
> Would also be interesting to run perf to see where the overhead is. There
> may be other locations to work on to make it almost as fast as direct
> callers without the other baggage.

I can boot the change and run tests in qemu but for some reason it
won't boot on hw, so I have just perf report from qemu so far

there's fprobe/rethook machinery showing out as expected

jirka


---
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 23K of event 'cpu-clock:k'
# Event count (approx.): 5841250000
#
# Overhead  Command  Shared Object                                   Symbol                                            
# ........  .......  ..............................................  ..................................................
#
    18.65%  bench    [kernel.kallsyms]                               [k] syscall_enter_from_user_mode
            |
            ---syscall_enter_from_user_mode
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

    13.03%  bench    [kernel.kallsyms]                               [k] seqcount_lockdep_reader_access.constprop.0
            |
            ---seqcount_lockdep_reader_access.constprop.0
               ktime_get_coarse_real_ts64
               syscall_trace_enter.constprop.0
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     9.49%  bench    [kernel.kallsyms]                               [k] rethook_try_get
            |
            ---rethook_try_get
               fprobe_handler
               ftrace_trampoline
               __x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     8.71%  bench    [kernel.kallsyms]                               [k] rethook_recycle
            |
            ---rethook_recycle
               fprobe_handler
               ftrace_trampoline
               __x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     4.31%  bench    [kernel.kallsyms]                               [k] rcu_is_watching
            |
            ---rcu_is_watching
               |          
               |--1.49%--rethook_try_get
               |          fprobe_handler
               |          ftrace_trampoline
               |          __x64_sys_getpgid
               |          do_syscall_64
               |          entry_SYSCALL_64_after_hwframe
               |          syscall
               |          
               |--1.10%--do_getpgid
               |          __x64_sys_getpgid
               |          do_syscall_64
               |          entry_SYSCALL_64_after_hwframe
               |          syscall
               |          
               |--1.02%--__bpf_prog_exit
               |          call_bpf_prog.isra.0
               |          bpf_fprobe_entry
               |          fprobe_handler
               |          ftrace_trampoline
               |          __x64_sys_getpgid
               |          do_syscall_64
               |          entry_SYSCALL_64_after_hwframe
               |          syscall
               |          
                --0.70%--__bpf_prog_enter
                          call_bpf_prog.isra.0
                          bpf_fprobe_entry
                          fprobe_handler
                          ftrace_trampoline
                          __x64_sys_getpgid
                          do_syscall_64
                          entry_SYSCALL_64_after_hwframe
                          syscall

     2.94%  bench    [kernel.kallsyms]                               [k] lock_release
            |
            ---lock_release
               |          
               |--1.51%--call_bpf_prog.isra.0
               |          bpf_fprobe_entry
               |          fprobe_handler
               |          ftrace_trampoline
               |          __x64_sys_getpgid
               |          do_syscall_64
               |          entry_SYSCALL_64_after_hwframe
               |          syscall
               |          
                --1.43%--do_getpgid
                          __x64_sys_getpgid
                          do_syscall_64
                          entry_SYSCALL_64_after_hwframe
                          syscall

     2.91%  bench    bpf_prog_21856463590f61f1_bench_trigger_fentry  [k] bpf_prog_21856463590f61f1_bench_trigger_fentry
            |
            ---bpf_prog_21856463590f61f1_bench_trigger_fentry
               |          
                --2.66%--call_bpf_prog.isra.0
                          bpf_fprobe_entry
                          fprobe_handler
                          ftrace_trampoline
                          __x64_sys_getpgid
                          do_syscall_64
                          entry_SYSCALL_64_after_hwframe
                          syscall

     2.69%  bench    [kernel.kallsyms]                               [k] bpf_fprobe_entry
            |
            ---bpf_fprobe_entry
               fprobe_handler
               ftrace_trampoline
               __x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     2.60%  bench    [kernel.kallsyms]                               [k] lock_acquire
            |
            ---lock_acquire
               |          
               |--1.34%--__bpf_prog_enter
               |          call_bpf_prog.isra.0
               |          bpf_fprobe_entry
               |          fprobe_handler
               |          ftrace_trampoline
               |          __x64_sys_getpgid
               |          do_syscall_64
               |          entry_SYSCALL_64_after_hwframe
               |          syscall
               |          
                --1.24%--do_getpgid
                          __x64_sys_getpgid
                          do_syscall_64
                          entry_SYSCALL_64_after_hwframe
                          syscall

     2.42%  bench    [kernel.kallsyms]                               [k] syscall_exit_to_user_mode_prepare
            |
            ---syscall_exit_to_user_mode_prepare
               syscall_exit_to_user_mode
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     2.37%  bench    [kernel.kallsyms]                               [k] __audit_syscall_entry
            |
            ---__audit_syscall_entry
               syscall_trace_enter.constprop.0
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               |          
                --2.36%--syscall

     2.35%  bench    [kernel.kallsyms]                               [k] syscall_trace_enter.constprop.0
            |
            ---syscall_trace_enter.constprop.0
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     2.12%  bench    [kernel.kallsyms]                               [k] check_preemption_disabled
            |
            ---check_preemption_disabled
               |          
                --1.55%--rcu_is_watching
                          |          
                           --0.59%--do_getpgid
                                     __x64_sys_getpgid
                                     do_syscall_64
                                     entry_SYSCALL_64_after_hwframe
                                     syscall

     2.00%  bench    [kernel.kallsyms]                               [k] fprobe_handler
            |
            ---fprobe_handler
               ftrace_trampoline
               __x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     1.94%  bench    [kernel.kallsyms]                               [k] local_irq_disable_exit_to_user
            |
            ---local_irq_disable_exit_to_user
               syscall_exit_to_user_mode
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     1.84%  bench    [kernel.kallsyms]                               [k] rcu_read_lock_sched_held
            |
            ---rcu_read_lock_sched_held
               |          
               |--0.93%--lock_acquire
               |          
                --0.90%--lock_release

     1.71%  bench    [kernel.kallsyms]                               [k] migrate_enable
            |
            ---migrate_enable
               __bpf_prog_exit
               call_bpf_prog.isra.0
               bpf_fprobe_entry
               fprobe_handler
               ftrace_trampoline
               __x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     1.66%  bench    [kernel.kallsyms]                               [k] call_bpf_prog.isra.0
            |
            ---call_bpf_prog.isra.0
               bpf_fprobe_entry
               fprobe_handler
               ftrace_trampoline
               __x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     1.53%  bench    [kernel.kallsyms]                               [k] __rcu_read_unlock
            |
            ---__rcu_read_unlock
               |          
               |--0.86%--__bpf_prog_exit
               |          call_bpf_prog.isra.0
               |          bpf_fprobe_entry
               |          fprobe_handler
               |          ftrace_trampoline
               |          __x64_sys_getpgid
               |          do_syscall_64
               |          entry_SYSCALL_64_after_hwframe
               |          syscall
               |          
                --0.66%--do_getpgid
                          __x64_sys_getpgid
                          do_syscall_64
                          entry_SYSCALL_64_after_hwframe
                          syscall

     1.31%  bench    [kernel.kallsyms]                               [k] debug_smp_processor_id
            |
            ---debug_smp_processor_id
               |          
                --0.77%--rcu_is_watching

     1.22%  bench    [kernel.kallsyms]                               [k] migrate_disable
            |
            ---migrate_disable
               __bpf_prog_enter
               call_bpf_prog.isra.0
               bpf_fprobe_entry
               fprobe_handler
               ftrace_trampoline
               __x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     1.19%  bench    [kernel.kallsyms]                               [k] __bpf_prog_enter
            |
            ---__bpf_prog_enter
               call_bpf_prog.isra.0
               bpf_fprobe_entry
               fprobe_handler
               ftrace_trampoline
               __x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     0.84%  bench    [kernel.kallsyms]                               [k] __radix_tree_lookup
            |
            ---__radix_tree_lookup
               find_task_by_pid_ns
               do_getpgid
               __x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     0.82%  bench    [kernel.kallsyms]                               [k] do_getpgid
            |
            ---do_getpgid
               __x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     0.78%  bench    [kernel.kallsyms]                               [k] debug_lockdep_rcu_enabled
            |
            ---debug_lockdep_rcu_enabled
               |          
                --0.63%--rcu_read_lock_sched_held

     0.74%  bench    ftrace_trampoline                               [k] ftrace_trampoline
            |
            ---ftrace_trampoline
               __x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     0.72%  bench    [kernel.kallsyms]                               [k] preempt_count_add
            |
            ---preempt_count_add

     0.71%  bench    [kernel.kallsyms]                               [k] ktime_get_coarse_real_ts64
            |
            ---ktime_get_coarse_real_ts64
               syscall_trace_enter.constprop.0
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     0.69%  bench    [kernel.kallsyms]                               [k] do_syscall_64
            |
            ---do_syscall_64
               entry_SYSCALL_64_after_hwframe
               |          
                --0.68%--syscall

     0.60%  bench    [kernel.kallsyms]                               [k] preempt_count_sub
            |
            ---preempt_count_sub

     0.59%  bench    [kernel.kallsyms]                               [k] __rcu_read_lock
            |
            ---__rcu_read_lock

     0.59%  bench    [kernel.kallsyms]                               [k] __x64_sys_getpgid
            |
            ---__x64_sys_getpgid
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     0.58%  bench    [kernel.kallsyms]                               [k] __audit_syscall_exit
            |
            ---__audit_syscall_exit
               syscall_exit_to_user_mode_prepare
               syscall_exit_to_user_mode
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     0.53%  bench    [kernel.kallsyms]                               [k] audit_reset_context
            |
            ---audit_reset_context
               syscall_exit_to_user_mode_prepare
               syscall_exit_to_user_mode
               do_syscall_64
               entry_SYSCALL_64_after_hwframe
               syscall

     0.45%  bench    [kernel.kallsyms]                               [k] rcu_read_lock_held
     0.36%  bench    [kernel.kallsyms]                               [k] find_task_by_vpid
     0.32%  bench    [kernel.kallsyms]                               [k] __bpf_prog_exit
     0.26%  bench    [kernel.kallsyms]                               [k] syscall_exit_to_user_mode
     0.20%  bench    [kernel.kallsyms]                               [k] idr_find
     0.18%  bench    [kernel.kallsyms]                               [k] find_task_by_pid_ns
     0.17%  bench    [kernel.kallsyms]                               [k] update_prog_stats
     0.16%  bench    [kernel.kallsyms]                               [k] _raw_spin_unlock_irqrestore
     0.14%  bench    [kernel.kallsyms]                               [k] pid_task
     0.04%  bench    [kernel.kallsyms]                               [k] memchr_inv
     0.04%  bench    [kernel.kallsyms]                               [k] smp_call_function_many_cond
     0.03%  bench    [kernel.kallsyms]                               [k] do_user_addr_fault
     0.03%  bench    [kernel.kallsyms]                               [k] kallsyms_expand_symbol.constprop.0
     0.03%  bench    [kernel.kallsyms]                               [k] native_flush_tlb_global
     0.03%  bench    [kernel.kallsyms]                               [k] __change_page_attr_set_clr
     0.02%  bench    [kernel.kallsyms]                               [k] memcpy_erms
     0.02%  bench    [kernel.kallsyms]                               [k] unwind_next_frame
     0.02%  bench    [kernel.kallsyms]                               [k] copy_user_enhanced_fast_string
     0.01%  bench    [kernel.kallsyms]                               [k] __orc_find
     0.01%  bench    [kernel.kallsyms]                               [k] call_rcu
     0.01%  bench    [kernel.kallsyms]                               [k] __alloc_pages
     0.01%  bench    [kernel.kallsyms]                               [k] __purge_vmap_area_lazy
     0.01%  bench    [kernel.kallsyms]                               [k] __softirqentry_text_start
     0.01%  bench    [kernel.kallsyms]                               [k] __stack_depot_save
     0.01%  bench    [kernel.kallsyms]                               [k] __up_read
     0.01%  bench    [kernel.kallsyms]                               [k] __virt_addr_valid
     0.01%  bench    [kernel.kallsyms]                               [k] clear_page_erms
     0.01%  bench    [kernel.kallsyms]                               [k] deactivate_slab
     0.01%  bench    [kernel.kallsyms]                               [k] do_check_common
     0.01%  bench    [kernel.kallsyms]                               [k] finish_task_switch.isra.0
     0.01%  bench    [kernel.kallsyms]                               [k] free_unref_page_list
     0.01%  bench    [kernel.kallsyms]                               [k] ftrace_rec_iter_next
     0.01%  bench    [kernel.kallsyms]                               [k] handle_mm_fault
     0.01%  bench    [kernel.kallsyms]                               [k] orc_find.part.0
     0.01%  bench    [kernel.kallsyms]                               [k] try_charge_memcg
     0.00%  bench    [kernel.kallsyms]                               [k] ___slab_alloc
     0.00%  bench    [kernel.kallsyms]                               [k] __fdget_pos
     0.00%  bench    [kernel.kallsyms]                               [k] __handle_mm_fault
     0.00%  bench    [kernel.kallsyms]                               [k] __is_insn_slot_addr
     0.00%  bench    [kernel.kallsyms]                               [k] __kmalloc
     0.00%  bench    [kernel.kallsyms]                               [k] __mod_lruvec_page_state
     0.00%  bench    [kernel.kallsyms]                               [k] __mod_node_page_state
     0.00%  bench    [kernel.kallsyms]                               [k] __mutex_lock
     0.00%  bench    [kernel.kallsyms]                               [k] __raw_spin_lock_init
     0.00%  bench    [kernel.kallsyms]                               [k] alloc_vmap_area
     0.00%  bench    [kernel.kallsyms]                               [k] allocate_slab
     0.00%  bench    [kernel.kallsyms]                               [k] audit_get_tty
     0.00%  bench    [kernel.kallsyms]                               [k] bpf_ksym_find
     0.00%  bench    [kernel.kallsyms]                               [k] btf_check_all_metas
     0.00%  bench    [kernel.kallsyms]                               [k] btf_put
     0.00%  bench    [kernel.kallsyms]                               [k] cmpxchg_double_slab.constprop.0.isra.0
     0.00%  bench    [kernel.kallsyms]                               [k] do_fault
     0.00%  bench    [kernel.kallsyms]                               [k] do_raw_spin_trylock
     0.00%  bench    [kernel.kallsyms]                               [k] find_vma
     0.00%  bench    [kernel.kallsyms]                               [k] fs_reclaim_release
     0.00%  bench    [kernel.kallsyms]                               [k] ftrace_check_record
     0.00%  bench    [kernel.kallsyms]                               [k] ftrace_replace_code
     0.00%  bench    [kernel.kallsyms]                               [k] get_mem_cgroup_from_mm
     0.00%  bench    [kernel.kallsyms]                               [k] get_page_from_freelist
     0.00%  bench    [kernel.kallsyms]                               [k] in_gate_area_no_mm
     0.00%  bench    [kernel.kallsyms]                               [k] in_task_stack
     0.00%  bench    [kernel.kallsyms]                               [k] kernel_text_address
     0.00%  bench    [kernel.kallsyms]                               [k] kernfs_fop_read_iter
     0.00%  bench    [kernel.kallsyms]                               [k] kernfs_put_active
     0.00%  bench    [kernel.kallsyms]                               [k] kfree
     0.00%  bench    [kernel.kallsyms]                               [k] kmem_cache_alloc
     0.00%  bench    [kernel.kallsyms]                               [k] ksys_read
     0.00%  bench    [kernel.kallsyms]                               [k] lookup_address_in_pgd
     0.00%  bench    [kernel.kallsyms]                               [k] mlock_page_drain_local
     0.00%  bench    [kernel.kallsyms]                               [k] page_remove_rmap
     0.00%  bench    [kernel.kallsyms]                               [k] post_alloc_hook
     0.00%  bench    [kernel.kallsyms]                               [k] preempt_schedule_irq
     0.00%  bench    [kernel.kallsyms]                               [k] queue_work_on
     0.00%  bench    [kernel.kallsyms]                               [k] stack_trace_save
     0.00%  bench    [kernel.kallsyms]                               [k] within_error_injection_list


#
# (Tip: To record callchains for each sample: perf record -g)
#




More information about the linux-arm-kernel mailing list