[PATCH bpf-next v2 1/2] bpf, arm64: Remove redundant bpf_flush_icache() after pack allocator finalize
Xu Kuohai
xukuohai at huaweicloud.com
Tue Apr 14 04:16:33 PDT 2026
On 4/14/2026 5:38 PM, Puranjay Mohan wrote:
> On Tue, Apr 14, 2026 at 2:56 AM Xu Kuohai <xukuohai at huaweicloud.com> wrote:
>>
>> On 4/14/2026 3:11 AM, Puranjay Mohan wrote:
>>> bpf_flush_icache() calls flush_icache_range() to clean the data cache
>>> and invalidate the instruction cache for the JITed code region. However,
>>> since commit 1dad391daef1 ("bpf, arm64: use bpf_prog_pack for memory
>>> management"), this flush is redundant.
>>>
>>> bpf_jit_binary_pack_finalize() copies the JITed instructions to the ROX
>>> region via bpf_arch_text_copy() -> aarch64_insn_copy() -> __text_poke(),
>>> and __text_poke() already calls flush_icache_range() on the written
>>> range. The subsequent bpf_flush_icache() repeats the same cache
>>> maintenance on an overlapping range, including an unnecessary second
>>> synchronous IPI to all CPUs via kick_all_cpus_sync().
>>>
>>
>> So icache is flushed twice: once per instruction and again after all
>> instructions are copied. I think it's better to remove the per-instruction
>> flush and retain the single final flush to avoid repeating flush overhead
>> for each instruction.
>
> No, bpf_jit_binary_pack_finalize() is called at the end after the
> whole program is jited, and it calls: bpf_arch_text_copy(ro_header,
> rw_header, rw_header->size); which does aarch64_insn_copy(dst, src,
> len), this calls __text_poke() which copies the whole program and then
> does flush_icache_range((uintptr_t)addr, (uintptr_t)addr + len); once.
> This is correct, after this we don't need to call flush_icache_range()
> on the same range again.
>
> If we had been calling flush_icache_range() for each instruction, the
> system would hang due to the storm of IPIs.
Right, thanks for the explanation!
LGTM.
More information about the linux-riscv
mailing list