[C/R ARM][PATCH 2/3] ARM: Add the eclone system call

Wed Mar 24 14:19:07 EDT 2010

Russell King - ARM Linux [linux at arm.linux.org.uk] wrote:
| > +asmlinkage int sys_eclone(unsigned flags_low, struct clone_args __user *uca,
| > +			  int args_size, pid_t __user *pids,
| > +			  struct pt_regs *regs)
| > +{
| > +	int rc;
| > +	struct clone_args kca;
| > +	unsigned long flags;
| > +	int __user *parent_tidp;
| > +	int __user *child_tidp;
| > +	unsigned long __user stack;
| 
| __user on an integer type doesn't make any sense; integer types do not
| have address spaces.

Ah, will fix  that for x86 32/64 bit implementations.

| 
| > +	unsigned long stack_size;
| > +
| > +	rc = fetch_clone_args_from_user(uca, args_size, &kca);
| > +	if (rc)
| > +		return rc;
| > +
| > +	/*
| > +	 * TODO: Convert 'clone-flags' to 64-bits on all architectures.
| > +	 * TODO: When ->clone_flags_high is non-zero, copy it in to the
| > +	 * 	 higher word(s) of 'flags':
| > +	 *
| > +	 * 		flags = (kca.clone_flags_high << 32) | flags_low;
| > +	 */
| > +	flags = flags_low;
| > +	parent_tidp = (int *)(unsigned long)kca.parent_tid_ptr;
| > +	child_tidp = (int *)(unsigned long)kca.child_tid_ptr;
| 
| This will produce sparse errors.  Is there a reason why 'clone_args'
| tid pointers aren't already pointers marked with __user ?

Making them pointers would make them 32-bit on some and 64-bit on other
architectures. We wanted the fields to be the same size on all architectures
for easier portability/extensibility. Given that we are copying-in a structure
from user-space, copying in a few extra bytes on the 32-bit architectures
would not be significant.

| 
| > +
| > +	stack_size = (unsigned long)kca.child_stack_size;
| 
| Shouldn't this already be of integer type?
| 
| > +	if (stack_size)
| > +		return -EINVAL;
| 
| So the stack must have a zero size?  Is this missing a '!' ?

Some architectures (IA64 ?) use the stack-size field. Those that don't
need it should error out. Again, its because we are trying to keep the
interface common across architectures.

| 
| > +
| > +	stack = (unsigned long)kca.child_stack;
| > +	if (!stack)
| > +		stack = regs->ARM_sp;
| > +
| > +	return do_fork_with_pids(flags, stack, regs, stack_size, parent_tidp,
| > +				child_tidp, kca.nr_pids, pids);
| 
| Hmm, so let me get this syscall interface right.  We have some arguments
| passed in registers and others via a (variable sized?) structure.  It seems
| really weird to have, eg, a pointer to the pids and the number of pids
| passed in two separate ways.
| 
| The grouping between what's passed in registers and via this clone_args
| structure seems to be random.  Can it be sanitized?

:-) Well, we went through a lot of discussions on this. Here is one pointer
http://lkml.org/lkml/2009/9/11/92 (and that was version 6 of 13+ :-).

We wanted the first parameter of eclone(), flags, to remain 32-bit value to
avoid confusing user-space. (i.e if they accidentally pass a 64-bit flags
value to the clone() system call which takes 32-bit flags, the higher-32
bits would silently be dropped). 

By sticking the higher clone-flags in 'struct clone_args', all architectures
would have to do the same extra work to set the higher flags so less
chance of error.

Re: passing the number of pids, nr_pids in the structure, I think it was
just that by passing it in the structure, we could avoid using an extra
register for the system call parameter.

'pid_t *' would be of different size on different architecutres. Passing
it as a separate parameter would avoid the pointer conversion.  Note
that unlike the tid pointers above which are a few individual fields,
the pid_t array could in theory be large.

Hope that helps. Thanks the review comments.

Sukadev