[PATCH] um: fix execve stub execution on old host OSs

Yonting Lin linyongting at gmail.com
Fri Jun 6 21:22:48 PDT 2025


Hi Glenn,

Sun, 12 Jan 2025 14:07:36 -0600
Glenn Washburn <development at efficientek.com> wrote:
>On Fri, 10 Jan 2025 17:13:05 +0100
>Benjamin Berg <benjamin at sipsolutions.net> wrote:
>
>> From: Benjamin Berg <benjamin.berg at intel.com>
>> 
>> The stub execution uses the somewhat new close_range and execveat
>> syscalls. Of these two, the execveat call is essential, but the
>> close_range call is more about stub process hygiene rather than safety
>> (and its result is ignored).
>> 
>> Replace both calls with a raw syscall as older machines might not have a
>> recent enough kernel for close_range (with CLOSE_RANGE_CLOEXEC) or a
>> libc that does not yet expose both of the syscalls.
>> 
>> Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs")
>
>This change fixes the immediate issue, allowing the compile to complete
>successfully.
>
>> Reported-by: Glenn Washburn <development at efficientek.com>
>> Closes: https://lore.kernel.org/20250108022404.05e0de1e@crass-HP-ZBook-15-G2
>> Signed-off-by: Benjamin Berg <benjamin.berg at intel.com>
>> ---
>>  arch/um/os-Linux/skas/process.c | 16 +++++++++++++---
>>  1 file changed, 13 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c
>> index f683cfc9e51a..64e89921bc7f 100644
>> --- a/arch/um/os-Linux/skas/process.c
>> +++ b/arch/um/os-Linux/skas/process.c
>> @@ -181,6 +181,10 @@ extern char __syscall_stub_start[];
>>  
>>  static int stub_exe_fd;
>>  
>> +#ifndef CLOSE_RANGE_CLOEXEC
>> +#define CLOSE_RANGE_CLOEXEC	(1U << 2)
>> +#endif
>> +
>>  static int userspace_tramp(void *stack)
>>  {
>>  	char *const argv[] = { "uml-userspace", NULL };
>> @@ -202,8 +206,12 @@ static int userspace_tramp(void *stack)
>>  	init_data.stub_data_fd = phys_mapping(uml_to_phys(stack), &offset);
>>  	init_data.stub_data_offset = MMAP_OFFSET(offset);
>>  
>> -	/* Set CLOEXEC on all FDs and then unset on all memory related FDs */
>> -	close_range(0, ~0U, CLOSE_RANGE_CLOEXEC);
>> +	/*
>> +	 * Avoid leaking unneeded FDs to the stub by setting CLOEXEC on all FDs
>> +	 * and then unsetting it on all memory related FDs.
>> +	 * This is not strictly necessary from a safety perspective.
>> +	 */
>> +	syscall(__NR_close_range, 1, ~0U, CLOSE_RANGE_CLOEXEC);
>
>Was this intentional to change the fd parameter from 0 to 1?
>
>>  
>>  	fcntl(init_data.stub_data_fd, F_SETFD, 0);
>>  	for (iomem = iomem_regions; iomem; iomem = iomem->next)
>> @@ -224,7 +232,9 @@ static int userspace_tramp(void *stack)
>>  	if (ret != sizeof(init_data))
>>  		exit(4);
>>  
>> -	execveat(stub_exe_fd, "", argv, NULL, AT_EMPTY_PATH);
>> +	/* Raw execveat for compatibility with older libc versions */
>> +	syscall(__NR_execveat, stub_exe_fd, (unsigned long)"",
>> +		(unsigned long)argv, NULL, AT_EMPTY_PATH);
>
>I think it would look nicer to leave the call unchanged and define a stub
>function like libc does, but only if we detect that the stubs are
>undefined. I have a glibc specific patch for this, and then realized
>that it should be more generic to cover other libcs. So this patch here
>is more general. I think to have what I'd like, I'd need to add and run
>a test binaries that check for these functions, like autotools does.
>I'm not aware that this is really done in the kernel build system,
>though I do know that there are binaries built during build for
>generating binaries to be included in the kernel. So in theory I don't
>think it should be too much trouble to do. Basically the idea would be
>to have a system for testing host libc and outputting a config.h which
>will define for instance, HAVE_EXECVEAT, if support is detected in the
>linked libc for execveat. And in process.c define a stub around the
>syscall() for execveat if not defined HAVE_EXECVEAT.
>
>So perhaps for now, this is the best solution, but ultimately it would
>be nice to have the above and ultimately reverse these changes.

Sorry for my due enquiry, I don't want take a new thread with a patch.
It is just a short question.

Currently(119b1e61a769aa98e68599f44721661a4d8c55f3), there are three pieces
of code calling to syscall __NR_close_range:

# grep __NR_close_range -r arch/um/* -n
arch/um/kernel/skas/stub_exe.c:73:		res = stub_syscall3(__NR_close_range, 1, ~0U, 0);
arch/um/os-Linux/start_up.c:269:	if (stub_syscall3(__NR_close_range, 1, ~0U, 0))
arch/um/os-Linux/skas/process.c:331:	syscall(__NR_close_range, 0, ~0U, CLOSE_RANGE_CLOEXEC);

But my host kernel is 5.4 and fails to compile these pieces of codes because
there is no this syscall in my older kernel.

I was wondering if you are working on this issue. If not, I will try to 
make a draft solution to review.

Thanks!
Yongting
>
>Glenn
>
>  
>>  	exit(5);
>>  }



More information about the linux-um mailing list