[PATCH bpf] bpf, arm64: Fix stack frame construction for struct_ops trampoline

Xu Kuohai xukuohai at huaweicloud.com
Fri Oct 25 00:51:14 PDT 2024


On 10/25/2024 12:24 AM, Alexei Starovoitov wrote:
> On Thu, Oct 24, 2024 at 6:48 AM Xu Kuohai <xukuohai at huaweicloud.com> wrote:
>>
>> On 10/23/2024 11:16 AM, Xu Kuohai wrote:
>>> On 10/23/2024 7:37 AM, Puranjay Mohan wrote:
>>>> On Wed, Oct 23, 2024 at 12:50 AM Alexei Starovoitov
>>>> <alexei.starovoitov at gmail.com> wrote:
>>>>>
>>>>> On Sat, Oct 19, 2024 at 2:15 AM Xu Kuohai <xukuohai at huaweicloud.com> wrote:
>>>>>>
>>>>>> From: Xu Kuohai <xukuohai at huawei.com>
>>>>>>
>>>>>> The callsite layout for arm64 fentry is:
>>>>>>
>>>>>> mov x9, lr
>>>>>> nop
>>>>>>
>>>>>> When a bpf prog is attached, the nop instruction is patched to a call
>>>>>> to bpf trampoline:
>>>>>>
>>>>>> mov x9, lr
>>>>>> bl <bpf trampoline>
>>>>>>
>>>>>> This passes two return addresses to bpf trampoline: the return address
>>>>>> for the traced function/prog, stored in x9, and the return address for
>>>>>> the bpf trampoline, stored in lr. To ensure stacktrace works properly,
>>>>>> the bpf trampoline constructs two fake function stack frames using x9
>>>>>> and lr.
>>>>>>
>>>>>> However, struct_ops progs are used as function callbacks and are invoked
>>>>>> directly, without x9 being set as the fentry callsite does. Therefore,
>>>>>> only one stack frame should be constructed using lr for struct_ops.
>>>>>
>>>>> Are you saying that currently stack unwinder on arm64 is
>>>>> completely broken for struct_ops progs ?
>>>>> or it shows an extra frame that doesn't have to be shown ?
>>>>>
>>>>> If former then it's certainly a bpf tree material.
>>>>> If latter then bpf-next will do.
>>>>> Pls clarify.
>>>>
>>>> It is not completely broken, only an extra garbage frame is shown
>>>> between the caller of the trampoline and its caller.
>>>>
>>>
>>> Yep, exactly. Here is a perf script sample, where tcp_ack+0x404
>>> is the garbage frame.
>>>
>>> ffffffc0801a04b4 bpf_prog_50992e55a0f655a9_bpf_cubic_cong_avoid+0x98 (bpf_prog_50992e55a0f655a9_bpf_cubic_cong_avoid)
>>> ffffffc0801a228c [unknown] ([kernel.kallsyms]) // bpf trampoline
>>> ffffffd08d362590 tcp_ack+0x798 ([kernel.kallsyms]) // caller for bpf trampoline
>>> ffffffd08d3621fc tcp_ack+0x404 ([kernel.kallsyms]) // garbage frame
>>> ffffffd08d36452c tcp_rcv_established+0x4ac ([kernel.kallsyms])
>>> ffffffd08d375c58 tcp_v4_do_rcv+0x1f0 ([kernel.kallsyms])
>>> ffffffd08d378630 tcp_v4_rcv+0xeb8 ([kernel.kallsyms])
>>> ...
>>>
>>> And this sample also shows that there is no symbol for the
>>> struct_ops bpf trampoline. Maybe we should add symbol for it?
>>>
>>
>> Emm, stack unwinder on x86 is completely broken for struct_ops
>> progs.
>>
>> It's because the following function returns 0 for a struct_ops
>> bpf trampoline address as there is no corresponding kernel symbol,
>> which causes the address not to be recognized as kerneltext. As
>> a result, the winder stops on ip == 0.
>>
>> unsigned long unwind_get_return_address(struct unwind_state *state)
>> {
>>           if (unwind_done(state))
>>                   return 0;
>>
>>           return __kernel_text_address(state->ip) ? state->ip : 0;
>> }
>>
>> Here is an example of broken stack trace from perf sampling, where
>> only one stack frame is captured:
>>
>> ffffffffc000cfb4 bpf_prog_e60d93d3ec88d5ef_bpf_cubic_cong_avoid+0x78 (bpf_prog_e60d93d3ec88d5ef_bpf_cubic_cong_avoid)
>> (no more frames)
> 
> you mean arch_stack_walk() won't see anything after ip=0,
> but dump_stack() will still print the rest with "?" (as unreliable).
>

Yes, dump_stack() does not stop on !__kernel_text_address

> That's bad indeed.
> 
>> To fix it, I think kernel symbol should be added for struct_ops
>> trampoline.
> 
> Makes sense. Pls send a patch.
> 
> 
> As far as this patch please add an earlier example of double tcp_ack trace
> to commit log and resubmit targeting bpf-next.
>

Sure




More information about the linux-arm-kernel mailing list