[PATCH 7/8] sched_ext: Sub-allocator over kernel-claimed BPF arena pages

Andrea Righi arighi at nvidia.com
Thu May 21 00:56:05 PDT 2026


Hi Tejun,

On Wed, May 20, 2026 at 01:50:51PM -1000, Tejun Heo wrote:
> Build a per-scheduler sub-allocator on top of pages claimed from the BPF
> arena registered in the previous patch. Subsequent kernel-managed
> arena-resident structures (e.g. per-CPU set_cmask cmask) carve their storage
> from this pool.
> 
> scx_arena_pool_init() creates a gen_pool. scx_arena_alloc() returns the
> kernel VA. On exhaustion, the pool grows by claiming more pages via
> bpf_arena_alloc_pages_sleepable(). Chunks are added at the kernel-side
> mapping address; callers translate to the BPF-arena form themselves if
> needed.
> 
> Allocations sleep (GFP_KERNEL) - they may grow the pool through vzalloc and
> arena page allocation. All current consumers run from the enable path (after
> ops.init() and the kernel-side arena auto-discovery, before validate_ops()),
> where sleeping is fine.
> 
> scx_arena_pool_destroy() walks each chunk, returns outstanding ranges to the
> gen_pool with gen_pool_free() and then calls gen_pool_destroy(). The
> underlying arena pages are released when the arena map itself is torn down,
> so the pool destroy doesn't free them explicitly.
> 
> Signed-off-by: Tejun Heo <tj at kernel.org>
> ---

...

> +/*
> + * Allocate @size bytes from the arena pool. Returns kernel VA on success, NULL
> + * on failure. May grow the pool via scx_arena_grow() which sleeps. Caller must
> + * be in a GFP_KERNEL context.
> + */
> +void *scx_arena_alloc(struct scx_sched *sch, size_t size)
> +{
> +	unsigned long kern_va;
> +	u32 page_cnt;
> +
> +	might_sleep();
> +
> +	if (!sch->arena_pool)
> +		return NULL;
> +
> +	kern_va = gen_pool_alloc(sch->arena_pool, size);
> +	if (!kern_va) {
> +		page_cnt = max_t(u32, SCX_ARENA_GROW_PAGES,
> +				 (size + PAGE_SIZE - 1) >> PAGE_SHIFT);
> +		if (scx_arena_grow(sch, page_cnt))
> +			return NULL;
> +		kern_va = gen_pool_alloc(sch->arena_pool, size);
> +		if (!kern_va)
> +			return NULL;

IIUC, since @page_cnt is sized to cover @size and the new chunk is added empty
to the pool, gen_pool_alloc() here should always succeed. Should we do:

  if (WARN_ON_ONCE(!kern_va))
      return NULL;

to catch potential logical bugs / future concurrency / exotic configurations?

Thanks,
-Andrea



More information about the linux-arm-kernel mailing list