[RFC PATCH v2 10/13] x86/um: nommu: signal handling

Sat Nov 30 17:38:26 PST 2024

Hello,

On Thu, 28 Nov 2024 19:37:21 +0900,
Benjamin Berg wrote:

> > +#ifndef CONFIG_MMU
> > + memset(&r, 0, sizeof(r));
> > + /* mark is_user=1 when the IP is from userspace code. */
> > + if (mc && (REGS_IP(mc->gregs) > uml_reserved
> > +    && REGS_IP(mc->gregs) < high_physmem))
> > + r.is_user = 1;
> > + else
> > +#endif
> > + r.is_user = 0;
> 
> Does this work if we load modules dynamically?
> 
> I suppose one could map them into a separate memory area rather than
> running them directly from the physical memory.
> Otherwise we'll also get problem with the SECCOMP filter.

currently, I thought modules use the separate area from execmem, but
nommu allocator ignores this location info to map the memory; instead
mixing up with area used by userspace programs.

we may be able to come up with execmem_arch_setup() to fix this
situation.

so, no, this is_user detection doesn't work; modules also become
is_user=1.

MMU full allocator (normal UML and seccomp asl well ?) seems to be
fine as long as using execmem.

I will look into detail how we should handle.

> >   if (sig == SIGSEGV) {
> >   /* For segfaults, we want the data from the sigcontext. */
> >   get_regs_from_mc(&r, mc);
> > @@ -191,6 +199,7 @@ static void hard_handler(int sig, siginfo_t *si, void *p)
> >   ucontext_t *uc = p;
> >   mcontext_t *mc = &uc->uc_mcontext;
> >   unsigned long pending = 1UL << sig;
> > + int is_segv = 0;
> >  
> >   do {
> >   int nested, bail;
> > @@ -214,6 +223,7 @@ static void hard_handler(int sig, siginfo_t *si, void *p)
> >  
> >   while ((sig = ffs(pending)) != 0){
> >   sig--;
> > + is_segv = (sig == SIGSEGV) ? 1 : 0;
> >   pending &= ~(1 << sig);
> >   (*handlers[sig])(sig, (struct siginfo *)si, mc);
> >   }
> > @@ -227,6 +237,12 @@ static void hard_handler(int sig, siginfo_t *si, void *p)
> >   if (!nested)
> >   pending = from_irq_stack(nested);
> >   } while (pending);
> > +
> > +#ifndef CONFIG_MMU
> > + /* if there is SIGSEGV notified, let the userspace run w/ __noreturn */
> > + if (is_segv)
> > + sigsegv_post_routine();
> > +#endif
> >  }
> 
> I am confused, this doesn't feel quite correct to me.

thanks for pointing this out.  the above code, which I spot the
working example under nommu, is indeed suspicious and doesn't look a
right code.

that signal handing (this patch) is immature, and need more work to
understand existing code, nommu characteristic, etc.

> So, for normal UML, I think we always do an rt_sigreturn. Which means,
> we always go back to the corresponding *kernel* task. To schedule in
> response to SIGALRM, we forward the signal to the userspace process.
> I believe that means:
>    1. We cannot schedule kernel threads (that seems like a bug)
>    2. Scheduling for userspace happens once the signal is delivered.
>       Then userspace() saves the state and calls interrupt_end().
> 
> 
> Now, keep in mind that we are on the separate signal stack here. If we
> jump anywhere directly, we abandon the old state information stored by
> the host kernel into the mcontext. We can absolutely do that, but we
> need to be careful to not forget anything.
> 
> As such, I wonder whether nommu should:
>    1. When entering from kernel, update "current->thread.switch_buf"
>       from the mcontext.
>        - If we need to schedule, push a stack frame that calls the scheduling
>          code and returns with the correct state.
>    2. When entering from user, store the task registers from the
>       mcontext. At some point (here or earlier) ensure that the
>       "current->thread.switch_buf" is set up so that we can return to
>       userspace by restoring the task registers.
>        - To schedule, piggy back on 1. or add special code.
>    3. Always do a UML_LONGJMP() back into the "current" task.

thanks, the current code jumps in the signal handler and unblocking
signals without returning the handler (and not calling rt_sigreturn at
host either) upon SIGSEGV, which should not work as you mentioned.

I will also investigate how I can handle.

> That said, I am probably not having the full picture right now.
> 
> Benjamin
> 
> PS: On a further note, I think the current code to enter userspace
> cannot handle single stepping. I suppose that is fine, but you should
> probably set arch_has_single_step to 0 for nommu.

I did almost zero tests with ptrace(2) (inside nommu UM) and might
miss a lot of features that mmu-UM could.  will also look into that.

thanks,

-- Hajime