[PATCH v4 4/7] uaccess-buffer: add CONFIG_GENERIC_ENTRY support

Sat Dec 11 03:50:14 PST 2021

Peter,

On Thu, Dec 09 2021 at 14:15, Peter Collingbourne wrote:
> @@ -197,14 +201,19 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>  static void exit_to_user_mode_prepare(struct pt_regs *regs)
>  {
>  	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> +	bool uaccess_buffer_pending;
>  
>  	lockdep_assert_irqs_disabled();
>  
>  	/* Flush pending rcuog wakeup before the last need_resched() check */
>  	tick_nohz_user_enter_prepare();
>  
> -	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
> +	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) {
> +		bool uaccess_buffer_pending = uaccess_buffer_pre_exit_loop();
> +
>  		ti_work = exit_to_user_mode_loop(regs, ti_work);
> +		uaccess_buffer_post_exit_loop(uaccess_buffer_pending);

What? Let me look at the these two functions, which are so full of useful
comments:

> +bool __uaccess_buffer_pre_exit_loop(void)
> +{
> +	struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> +	struct uaccess_descriptor __user *desc_ptr;
> +	sigset_t tmp_mask;
> +
> +	if (get_user(desc_ptr, buf->desc_ptr_ptr) || !desc_ptr)
> +		return false;
> +
> +	current->real_blocked = current->blocked;
> +	sigfillset(&tmp_mask);
> +	set_current_blocked(&tmp_mask);

This prevents signal delivery in exit_to_user_mode_loop(), right?

> +	return true;
> +}
> +
> +void __uaccess_buffer_post_exit_loop(void)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&current->sighand->siglock, flags);
> +	current->blocked = current->real_blocked;
> +	recalc_sigpending();

This restores the signal blocked mask _after_ exit_to_user_mode_loop()
has completed, recalculates pending signals and goes out to user space
with eventually pending signals.

How is this supposed to be even remotely correct?

But that aside, let me look at the whole picture as I understand it from
reverse engineering it. Yes, reverse engineering, because there are
neither comments in the code nor any useful information in the
changelogs of 2/7 and 4/7. Also the cover letter and the "documentation"
are not explaining any of this and just blurb about sanitizers and how
wonderful this all is.

> @@ -70,6 +71,9 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
>  			return ret;
>  	}
>  
> +	if (work & SYSCALL_WORK_UACCESS_BUFFER_ENTRY)
> +		uaccess_buffer_syscall_entry();

This conditionally sets SYSCALL_WORK_UACCESS_BUFFER_EXIT.

> @@ -247,6 +256,9 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
>  
>  	audit_syscall_exit(regs);
>  
> +	if (work & SYSCALL_WORK_UACCESS_BUFFER_EXIT)
> +		uaccess_buffer_syscall_exit();

When returning from the syscall and SYSCALL_WORK_UACCESS_BUFFER_EXIT is
set, then uaccess_buffer_syscall_exit() clears
SYSCALL_WORK_UACCESS_BUFFER_EXIT, right?

This is called _before_ exit_to_user_mode_prepare(). So why is this
__uaccess_buffer_pre/post_exit_loop() required at all?

It's not required at all. Why?

Simply because there are only two ways how exit_to_user_mode_prepare()
can be reached:

  1) When returning from a syscall

  2) When returning from an interrupt which hit user mode execution

#1 SYSCALL_WORK_UACCESS_BUFFER_EXIT is cleared _before_
   exit_to_user_mode_prepare() is reached as documented above.

#2 SYSCALL_WORK_UACCESS_BUFFER_EXIT cannot be set because the entry
   to the kernel does not go through syscall_trace_enter().

So what is this pre/post exit loop code about? Handle something which
cannot happen in the first place?

If at all this would warrant a:

	if (WARN_ON_ONCE(test_syscall_work(UACCESS_BUFFER_ENTRY)))
        	do_something_sensible();

instead of adding undocumented voodoo w/o providing any rationale. Well,
I can see why that was not provided because there is no rationale to
begin with.

Seriously, I'm all for better instrumentation and analysis, but if the
code provided for that is incomprehensible, uncommented and
undocumented, then the result is worse than what we have now.

If you think that this qualifies as documentation:

> +/*
> + * uaccess_buffer_syscall_entry - hook to be run before syscall entry
> + */

> +/*
> + * uaccess_buffer_syscall_exit - hook to be run after syscall exit
> + */

> +/*
> + * uaccess_buffer_pre_exit_loop - hook to be run immediately before the
> + * pre-kernel-exit loop that handles signals, tracing etc. Returns a bool to
> + * be passed to uaccess_buffer_post_exit_loop.
> + */

> +/*
> + * uaccess_buffer_post_exit_loop - hook to be run immediately after the
> + * pre-kernel-exit loop that handles signals, tracing etc.
> + * @pending: the bool returned from uaccess_buffer_pre_exit_loop.
> + */

then we have a very differrent understanding of what documentation
should provide.

Thanks,

        tglx