[PATCH bpf] bpf, arm64: Fix stack frame construction for struct_ops trampoline

Xu Kuohai xukuohai at huaweicloud.com
Thu Oct 24 06:48:17 PDT 2024


On 10/23/2024 11:16 AM, Xu Kuohai wrote:
> On 10/23/2024 7:37 AM, Puranjay Mohan wrote:
>> On Wed, Oct 23, 2024 at 12:50 AM Alexei Starovoitov
>> <alexei.starovoitov at gmail.com> wrote:
>>>
>>> On Sat, Oct 19, 2024 at 2:15 AM Xu Kuohai <xukuohai at huaweicloud.com> wrote:
>>>>
>>>> From: Xu Kuohai <xukuohai at huawei.com>
>>>>
>>>> The callsite layout for arm64 fentry is:
>>>>
>>>> mov x9, lr
>>>> nop
>>>>
>>>> When a bpf prog is attached, the nop instruction is patched to a call
>>>> to bpf trampoline:
>>>>
>>>> mov x9, lr
>>>> bl <bpf trampoline>
>>>>
>>>> This passes two return addresses to bpf trampoline: the return address
>>>> for the traced function/prog, stored in x9, and the return address for
>>>> the bpf trampoline, stored in lr. To ensure stacktrace works properly,
>>>> the bpf trampoline constructs two fake function stack frames using x9
>>>> and lr.
>>>>
>>>> However, struct_ops progs are used as function callbacks and are invoked
>>>> directly, without x9 being set as the fentry callsite does. Therefore,
>>>> only one stack frame should be constructed using lr for struct_ops.
>>>
>>> Are you saying that currently stack unwinder on arm64 is
>>> completely broken for struct_ops progs ?
>>> or it shows an extra frame that doesn't have to be shown ?
>>>
>>> If former then it's certainly a bpf tree material.
>>> If latter then bpf-next will do.
>>> Pls clarify.
>>
>> It is not completely broken, only an extra garbage frame is shown
>> between the caller of the trampoline and its caller.
>>
> 
> Yep, exactly. Here is a perf script sample, where tcp_ack+0x404
> is the garbage frame.
> 
> ffffffc0801a04b4 bpf_prog_50992e55a0f655a9_bpf_cubic_cong_avoid+0x98 (bpf_prog_50992e55a0f655a9_bpf_cubic_cong_avoid)
> ffffffc0801a228c [unknown] ([kernel.kallsyms]) // bpf trampoline
> ffffffd08d362590 tcp_ack+0x798 ([kernel.kallsyms]) // caller for bpf trampoline
> ffffffd08d3621fc tcp_ack+0x404 ([kernel.kallsyms]) // garbage frame
> ffffffd08d36452c tcp_rcv_established+0x4ac ([kernel.kallsyms])
> ffffffd08d375c58 tcp_v4_do_rcv+0x1f0 ([kernel.kallsyms])
> ffffffd08d378630 tcp_v4_rcv+0xeb8 ([kernel.kallsyms])
> ...
> 
> And this sample also shows that there is no symbol for the
> struct_ops bpf trampoline. Maybe we should add symbol for it?
>

Emm, stack unwinder on x86 is completely broken for struct_ops
progs.

It's because the following function returns 0 for a struct_ops
bpf trampoline address as there is no corresponding kernel symbol,
which causes the address not to be recognized as kerneltext. As
a result, the winder stops on ip == 0.

unsigned long unwind_get_return_address(struct unwind_state *state)
{
         if (unwind_done(state))
                 return 0;

         return __kernel_text_address(state->ip) ? state->ip : 0;
}

Here is an example of broken stack trace from perf sampling, where
only one stack frame is captured:

ffffffffc000cfb4 bpf_prog_e60d93d3ec88d5ef_bpf_cubic_cong_avoid+0x78 (bpf_prog_e60d93d3ec88d5ef_bpf_cubic_cong_avoid)
(no more frames)

To fix it, I think kernel symbol should be added for struct_ops
trampoline.

>> So, this can go from the bpf-next tree. But let's wait for Xu to
>> provide more information.
>>
>> Thanks,
>> Puranjay
>>
> 




More information about the linux-arm-kernel mailing list