[PATCH bpf v2] bpf: do not use kmalloc_nolock when !HAVE_CMPXCHG_DOUBLE
Levi Zim via B4 Relay
devnull+rsworktech.outlook.com at kernel.org
Sat Mar 14 08:31:28 PDT 2026
From: Levi Zim <rsworktech at outlook.com>
kmalloc_nolock always fails for architectures that lack cmpxchg16b.
For example, this causes bpf_task_storage_get with flag
BPF_LOCAL_STORAGE_GET_F_CREATE to fails on riscv64 6.19 kernel.
Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
But leave the PREEMPT_RT case as is because it requires kmalloc_nolock
for correctness. Add a comment about this limitation that architecture's
lack of CMPXCHG_DOUBLE combined with PREEMPT_RT could make
bpf_local_storage_alloc always fail.
Fixes: f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage")
Cc: stable at vger.kernel.org
Signed-off-by: Levi Zim <rsworktech at outlook.com>
---
I find that bpf_task_storage_get with flag BPF_LOCAL_STORAGE_GET_F_CREATE
always fails for me on 6.19 kernel on riscv64 and bisected it.
In f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock()
in local storage"), bpf memory allocator is replaced with kmalloc_nolock.
This approach is problematic for architectures that lack CMPXCHG_DOUBLE
because kmalloc_nolock always fails in this case:
In function kmalloc_nolock (kmalloc_nolock_noprof):
if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
/*
* kmalloc_nolock() is not supported on architectures that
* don't implement cmpxchg16b, but debug caches don't use
* per-cpu slab and per-cpu partial slabs. They rely on
* kmem_cache_node->list_lock, so kmalloc_nolock() can
* attempt to allocate from debug caches by
* spin_trylock_irqsave(&n->list_lock, ...)
*/
return NULL;
Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
(But not for a PREEMPT_RT case as explained in the comment and commitmsg)
Note for stable: this only needs to be picked into v6.19 if the patch
makes it into 7.0.
---
Changes in v2:
- Drop the modification to the PREEMPT_RT case as it requires
kmalloc_nolock for correctness.
- Add a comment to the PREEMPT_RT case about the limitation when
not HAVE_CMPXCHG_DOUBLE but enables PREEMPT_RT.
- Link to v1: https://lore.kernel.org/r/20260314-bpf-kmalloc-nolock-v1-1-24abf3f75a9f@outlook.com
---
include/linux/bpf_local_storage.h | 2 ++
kernel/bpf/bpf_cgrp_storage.c | 2 +-
kernel/bpf/bpf_local_storage.c | 4 ++++
kernel/bpf/bpf_task_storage.c | 2 +-
4 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 8157e8da61d40..a7ae5dde15bcb 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -19,6 +19,8 @@
#define BPF_LOCAL_STORAGE_CACHE_SIZE 16
+static const bool KMALLOC_NOLOCK_SUPPORTED = IS_ENABLED(CONFIG_HAVE_CMPXCHG_DOUBLE);
+
struct bpf_local_storage_map_bucket {
struct hlist_head list;
rqspinlock_t lock;
diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
index c2a2ead1f466d..09c557e426968 100644
--- a/kernel/bpf/bpf_cgrp_storage.c
+++ b/kernel/bpf/bpf_cgrp_storage.c
@@ -114,7 +114,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
{
- return bpf_local_storage_map_alloc(attr, &cgroup_cache, true);
+ return bpf_local_storage_map_alloc(attr, &cgroup_cache, KMALLOC_NOLOCK_SUPPORTED);
}
static void cgroup_storage_map_free(struct bpf_map *map)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 9c96a4477f81a..a6c240da87668 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -893,6 +893,10 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
/* In PREEMPT_RT, kmalloc(GFP_ATOMIC) is still not safe in non
* preemptible context. Thus, enforce all storages to use
* kmalloc_nolock() when CONFIG_PREEMPT_RT is enabled.
+ *
+ * However, kmalloc_nolock would fail on architectures that do not
+ * have CMPXCHG_DOUBLE. On such architectures with PREEMPT_RT,
+ * bpf_local_storage_alloc would always fail.
*/
smap->use_kmalloc_nolock = IS_ENABLED(CONFIG_PREEMPT_RT) ? true : use_kmalloc_nolock;
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index 605506792b5b4..9f74fd1ef7f46 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -212,7 +212,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
{
- return bpf_local_storage_map_alloc(attr, &task_cache, true);
+ return bpf_local_storage_map_alloc(attr, &task_cache, KMALLOC_NOLOCK_SUPPORTED);
}
static void task_storage_map_free(struct bpf_map *map)
---
base-commit: e06e6b8001233241eb5b2e2791162f0585f50f4b
change-id: 20260314-bpf-kmalloc-nolock-60da80e613de
Best regards,
--
Levi Zim <rsworktech at outlook.com>
More information about the linux-riscv
mailing list