[PATCHv3 3/9] bpf: Introduce bpf_copy_to_kernel() to buffer the content from bpf-prog
Philipp Rudo
prudo at redhat.com
Wed Jun 25 11:10:33 PDT 2025
Hi Pingfan,
Hi Alexei,
sorry for the late reply.
On Thu, 29 May 2025 12:17:38 +0800
Pingfan Liu <piliu at redhat.com> wrote:
> In the security kexec_file_load case, the buffer which holds the kernel
> image is invisible to the userspace.
>
> The common data flow in bpf scheme is from kernel to bpf-prog. In the
> case of kexec_file_load, the kexec component needs to buffer the parsed
> result by bpf-prog (opposite the usual direction) to the next stage
> parsing. bpf_kexec_carrier() makes the opposite data flow possible. A
> bpf-prog can publish the parsed payload address to the kernel, and the
> latter can copy them for future use.
>
> Signed-off-by: Pingfan Liu <piliu at redhat.com>
> Cc: Alexei Starovoitov <ast at kernel.org>
> Cc: Daniel Borkmann <daniel at iogearbox.net>
> Cc: John Fastabend <john.fastabend at gmail.com>
> Cc: Andrii Nakryiko <andrii at kernel.org>
> Cc: Martin KaFai Lau <martin.lau at linux.dev>
> Cc: Eduard Zingerman <eddyz87 at gmail.com>
> Cc: Song Liu <song at kernel.org>
> Cc: Yonghong Song <yonghong.song at linux.dev>
> Cc: KP Singh <kpsingh at kernel.org>
> Cc: Stanislav Fomichev <sdf at fomichev.me>
> Cc: Hao Luo <haoluo at google.com>
> Cc: Jiri Olsa <jolsa at kernel.org>
> To: bpf at vger.kernel.org
> ---
> include/linux/bpf.h | 23 +++++
> kernel/bpf/Makefile | 2 +-
> kernel/bpf/helpers.c | 2 +
> kernel/bpf/helpers_carrier.c | 194 +++++++++++++++++++++++++++++++++++
> 4 files changed, 220 insertions(+), 1 deletion(-)
> create mode 100644 kernel/bpf/helpers_carrier.c
>
[...]
> diff --git a/kernel/bpf/helpers_carrier.c b/kernel/bpf/helpers_carrier.c
> new file mode 100644
> index 0000000000000..c4e45fdf0ebb8
> --- /dev/null
> +++ b/kernel/bpf/helpers_carrier.c
> @@ -0,0 +1,194 @@
[...]
> +__bpf_kfunc int bpf_mem_range_result_put(struct mem_range_result *result)
I'm concerned about the use of kfuncs for our use case. I don't believe
they provide the stability we need.
With kexec we deal with two different kernels. The 1st kernel, aka. the
one that executes kexec to load the 2nd kernel, and the 2nd kernel that
is being loaded. In general both kernels are built from different
versions with different configs and it is expected that kexec works
even when both kernels are years apart.
The problem is that in our design the bpf-prog is part of the
image of and built from the sources of the 2nd kernel, but runs in the
1st kernel. So the definitions of the kfuncs in both kernels have to
match. What makes it worse is that for it to work with secure boot the
kernel image, including the bpf-prog, needs to be signed. Which means
that the bpf-prog is fixed after build and can no longer be updated.
All in all I'm afraid we need a uapi-like stability for those kfuncs
for our design to work. Do you have any comments on my concern? Or any
idea how we could archive the stability despite using kfuncs?
Thanks
Philipp
> +{
> + return mem_range_result_put(result);
> +}
> +
> +/*
> + * Cache the content in @buf into kernel
> + */
> +__bpf_kfunc int bpf_copy_to_kernel(const char *name, char *buf, int size)
> +{
> + struct mem_range_result *range;
> + struct mem_cgroup *memcg, *old_memcg;
> + struct str_listener *item;
> + resource_handler handler;
> + bool kmalloc;
> + char *kbuf;
> + int id, ret = 0;
> +
> + id = srcu_read_lock(&srcu);
> + item = find_listener(name);
> + if (!item) {
> + srcu_read_unlock(&srcu, id);
> + return -EINVAL;
> + }
> + kmalloc = item->kmalloc;
> + handler = item->handler;
> + srcu_read_unlock(&srcu, id);
> + memcg = get_mem_cgroup_from_current();
> + old_memcg = set_active_memcg(memcg);
> + range = kmalloc(sizeof(struct mem_range_result), GFP_KERNEL);
> + if (!range) {
> + pr_err("fail to allocate mem_range_result\n");
> + ret = -ENOMEM;
> + goto err;
> + }
> +
> + kref_init(&range->ref);
> + if (item->kmalloc)
> + kbuf = kmalloc(size, GFP_KERNEL | __GFP_ACCOUNT);
> + else
> + kbuf = __vmalloc(size, GFP_KERNEL | __GFP_ACCOUNT);
> + if (!kbuf) {
> + kfree(range);
> + ret = -ENOMEM;
> + goto err;
> + }
> + ret = copy_from_kernel_nofault(kbuf, buf, size);
> + if (unlikely(ret < 0)) {
> + kfree(range);
> + if (item->kmalloc)
> + kfree(kbuf);
> + else
> + vfree(kbuf);
> + ret = -EINVAL;
> + goto err;
> + }
> + range->kmalloc = item->kmalloc;
> + range->buf = kbuf;
> + range->buf_sz = size;
> + range->data_sz = size;
> + range->memcg = memcg;
> + mem_cgroup_tryget(memcg);
> + range->status = 0;
> + ret = handler(name, range);
> + mem_range_result_put(range);
> +err:
> + set_active_memcg(old_memcg);
> + mem_cgroup_put(memcg);
> + return ret;
> +}
> +
> +int register_carrier_listener(struct carrier_listener *listener)
> +{
> + struct str_listener *item;
> + unsigned int hash;
> + int ret;
> +
> + if (!listener->name)
> + return -EINVAL;
> + item = kmalloc(sizeof(*item), GFP_KERNEL);
> + if (!item)
> + return -ENOMEM;
> + item->str = kstrdup(listener->name, GFP_KERNEL);
> + if (!item->str) {
> + kfree(item);
> + return -ENOMEM;
> + }
> + item->handler = listener->handler;
> + item->kmalloc = listener->kmalloc;
> + hash = jhash(item->str, strlen(item->str), 0);
> + mutex_lock(&str_listeners_mutex);
> + if (!find_listener(item->str)) {
> + hash_add(str_listeners, &item->node, hash);
> + } else {
> + kfree(item->str);
> + kfree(item);
> + ret = -EBUSY;
> + }
> + mutex_unlock(&str_listeners_mutex);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(register_carrier_listener);
> +
> +int unregister_carrier_listener(char *str)
> +{
> + struct str_listener *item;
> + int ret = 0;
> +
> + mutex_lock(&str_listeners_mutex);
> + item = find_listener(str);
> + if (!!item)
> + hash_del(&item->node);
> + else
> + ret = -EINVAL;
> + mutex_unlock(&str_listeners_mutex);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(unregister_carrier_listener);
> +
More information about the kexec
mailing list