WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342

Fri Aug 25 12:49:35 PDT 2023

On Fri, Aug 25, 2023 at 11:53 AM Alexei Starovoitov
<alexei.starovoitov at gmail.com> wrote:
>
> On Fri, Aug 25, 2023 at 8:28 AM Yonghong Song <yonghong.song at linux.dev> wrote:
> >
> >
> >
> > On 8/25/23 3:32 AM, Björn Töpel wrote:
> > > I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf
> > > selftests on bpf-next 9e3b47abeb8f.
> > >
> > > I'm able to reproduce the hang by multiple runs of:
> > >   | ./test_progs -a link_api -a linked_list
> > > I'm currently investigating that.
> > >
> > > But! Sometimes (every blue moon) I get a warn_on_once hit:
> > >   | ------------[ cut here ]------------
> > >   | WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342 bpf_mem_refill+0x1fc/0x206
> > >   | Modules linked in: bpf_testmod(OE)
> > >   | CPU: 3 PID: 261 Comm: test_progs-cpuv Tainted: G           OE    N 6.5.0-rc5-01743-gdcb152bb8328 #2
> > >   | Hardware name: riscv-virtio,qemu (DT)
> > >   | epc : bpf_mem_refill+0x1fc/0x206
> > >   |  ra : irq_work_single+0x68/0x70
> > >   | epc : ffffffff801b1bc4 ra : ffffffff8015fe84 sp : ff2000000001be20
> > >   |  gp : ffffffff82d26138 tp : ff6000008477a800 t0 : 0000000000046600
> > >   |  t1 : ffffffff812b6ddc t2 : 0000000000000000 s0 : ff2000000001be70
> > >   |  s1 : ff5ffffffffe8998 a0 : ff5ffffffffe8998 a1 : ff600003fef4b000
> > >   |  a2 : 000000000000003f a3 : ffffffff80008250 a4 : 0000000000000060
> > >   |  a5 : 0000000000000080 a6 : 0000000000000000 a7 : 0000000000735049
> > >   |  s2 : ff5ffffffffe8998 s3 : 0000000000000022 s4 : 0000000000001000
> > >   |  s5 : 0000000000000007 s6 : ff5ffffffffe8570 s7 : ffffffff82d6bd30
> > >   |  s8 : 000000000000003f s9 : ffffffff82d2c5e8 s10: 000000000000ffff
> > >   |  s11: ffffffff82d2c5d8 t3 : ffffffff81ea8f28 t4 : 0000000000000000
> > >   |  t5 : ff6000008fd28278 t6 : 0000000000040000
> > >   | status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > >   | [<ffffffff801b1bc4>] bpf_mem_refill+0x1fc/0x206
> > >   | [<ffffffff8015fe84>] irq_work_single+0x68/0x70
> > >   | [<ffffffff8015feb4>] irq_work_run_list+0x28/0x36
> > >   | [<ffffffff8015fefa>] irq_work_run+0x38/0x66
> > >   | [<ffffffff8000828a>] handle_IPI+0x3a/0xb4
> > >   | [<ffffffff800a5c3a>] handle_percpu_devid_irq+0xa4/0x1f8
> > >   | [<ffffffff8009fafa>] generic_handle_domain_irq+0x28/0x36
> > >   | [<ffffffff800ae570>] ipi_mux_process+0xac/0xfa
> > >   | [<ffffffff8000a8ea>] sbi_ipi_handle+0x2e/0x88
> > >   | [<ffffffff8009fafa>] generic_handle_domain_irq+0x28/0x36
> > >   | [<ffffffff807ee70e>] riscv_intc_irq+0x36/0x4e
> > >   | [<ffffffff812b5d3a>] handle_riscv_irq+0x54/0x86
> > >   | [<ffffffff812b6904>] do_irq+0x66/0x98
> > >   | ---[ end trace 0000000000000000 ]---
> > >
> > > Code:
> > >   | static void free_bulk(struct bpf_mem_cache *c)
> > >   | {
> > >   |   struct bpf_mem_cache *tgt = c->tgt;
> > >   |   struct llist_node *llnode, *t;
> > >   |   unsigned long flags;
> > >   |   int cnt;
> > >   |
> > >   |   WARN_ON_ONCE(tgt->unit_size != c->unit_size);
> > >   | ...
> > >
> > > I'm not well versed in the memory allocator; Before I dive into it --
> > > has anyone else hit it? Ideas on why the warn_on_once is hit?
> >
> > Maybe take a look at the patch
> >    822fb26bdb55  bpf: Add a hint to allocated objects.
> >
> > In the above patch, we have
> >
> > +       /*
> > +        * Remember bpf_mem_cache that allocated this object.
> > +        * The hint is not accurate.
> > +        */
> > +       c->tgt = *(struct bpf_mem_cache **)llnode;
> >
> > I suspect that the warning may be related to the above.
> > I tried the above ./test_progs command line (running multiple
> > at the same time) and didn't trigger the issue.
>
> Interesting. Thanks for the report.
> I wasn't able to repo the warn either, but I can confirm that linked_list
> test is super slow on the kernel with debugs on and may appear to "hang",
> since it doesn't respond to key press for many seconds.
>
> time ./test_progs -a linked_list
> Summary: 1/132 PASSED, 0 SKIPPED, 0 FAILED
>
> real    0m35.897s
> user    0m1.860s
> sys    0m32.105s
>
> That's not normal.
>
> Kumar, Dave,
>
> do you have any suggestions?

Andrii,

this issue seems to be a scalability issue with libbpf.
The perf report looks like:

    9.89%  test_progs       [kernel.kallsyms]   [k] __lock_acquire
    7.70%  test_progs       [kernel.kallsyms]   [k] check_preemption_disabled
    2.61%  test_progs       [kernel.kallsyms]   [k] asm_exc_page_fault
    2.32%  test_progs       [kernel.kallsyms]   [k] rcu_is_watching
    2.30%  test_progs       [kernel.kallsyms]   [k] memcpy_orig
    2.26%  test_progs       test_progs          [.] btf_find_by_name_kind
    2.18%  test_progs       libc-2.28.so        [.] __strcmp_avx2
    2.14%  test_progs       [kernel.kallsyms]   [k] lock_acquire
    2.10%  test_progs       [kernel.kallsyms]   [k] mark_lock.part.48
    2.07%  test_progs       test_progs          [.] btf_kind
    2.05%  test_progs       [kernel.kallsyms]   [k] clear_page_erms
    1.89%  test_progs       test_progs          [.] btf_type_by_id
    1.83%  test_progs       [kernel.kallsyms]   [k] lock_is_held_type
    1.75%  test_progs       [kernel.kallsyms]   [k] lock_release
    1.31%  test_progs       [kernel.kallsyms]   [k] unwind_next_frame
    1.29%  test_progs       test_progs          [.] btf__type_by_id
    1.28%  test_progs       [kernel.kallsyms]   [k] rep_movs_alternative
    1.09%  test_progs       [kernel.kallsyms]   [k] __orc_find
    1.06%  swapper          [kernel.kallsyms]   [k] __lock_acquire
    1.02%  test_progs       test_progs          [.] btf_type_size
    0.84%  test_progs       test_progs          [.] btf_parse_type_sec
    0.78%  test_progs       [kernel.kallsyms]   [k] __create_object
    0.76%  test_progs       [kernel.kallsyms]   [k] __lookup_object
    0.75%  test_progs       test_progs          [.] btf__str_by_offset

The top 5 functions coming from kernel due to libbpf
reading /sys/kernel/btf/vmlinux
Then libbpf proceeded to do a search in btf.

Don't know whether libbpf is doing too many unnecessary reads on vmlinux btf
or what is actually happening, but something isn't right.
This test shouldn't be causing libbpf to be the main consumer of cpu.
Not sure whether other tests are similarly affected.
This is what I've debugged so far.