Race between SIGIO and epoll from SMP host
Anton Ivanov
anton.ivanov at kot-begemot.co.uk
Wed Apr 21 13:32:10 BST 2021
On 21/04/2021 12:53, YiFei Zhu wrote:
> Hi
>
> I was playing around with xxd large files when I noticed that the
> terminal output often stalls for a few seconds. It's a heisenbug so
> the root cause was mostly found by debugging-by-printing and
> trial-and-error.
>
> This bug I can reliably reproduce by running `xxd /dev/zero` inside
> UML that has its tty as `fd:0,fd:1`. Behavior: the terminal output
> stalls in the middle of a line, stalling for seconds and possibly a
> very long time. Keyboard input de-stalls the output but will stall
> again quickly.
>
> AFAICT, the way UML receives CONSOLE_WRITE_IRQ from stdout is that, it
> sets O_ASYNC on stdout, then whenever a SIGIO is received by UML, it
> checks for the origin of the SIGIO via epoll_wait in
> os_waiting_for_events_epoll. However, this is racy: A SIGIO can be
> generated prior to epoll receiving the event, so epoll_wait with zero
> timeout may be called before epoll receives the event, and the event
> can be missed.
>
> As a minimal reproducible example, this C code running on the host
> often assert failure in `res == 1`:
>
> [snip]
> #define _GNU_SOURCE 1
>
> #include <assert.h>
> #include <errno.h>
> #include <stdio.h>
> #include <signal.h>
> #include <sys/epoll.h>
> #include <sys/fcntl.h>
> #include <unistd.h>
>
> #define MAX_EPOLL_EVENTS 64
> volatile sig_atomic_t state;
> static struct epoll_event epoll_events[MAX_EPOLL_EVENTS];
>
> static char msg[] =
> "00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................\n";
>
> static void sigio_handler(int ignored)
> {
> assert(state == 1 || state == 0);
> state = 2;
> }
>
> int main(int argc, char **argv)
> {
> int epollfd, flags, res;
> struct epoll_event event;
>
> state = 0;
>
> signal(SIGIO, sigio_handler);
>
> epollfd = epoll_create(MAX_EPOLL_EVENTS);
> assert(epollfd >= 0);
>
> event.data.ptr = NULL;
> event.events = EPOLLOUT | EPOLLET;
> assert(!epoll_ctl(epollfd, EPOLL_CTL_ADD, 1, &event));
>
> flags = fcntl(1, F_GETFL);
> assert(flags >= 0);
> flags |= O_ASYNC | O_NONBLOCK;
>
> assert(!fcntl(1, F_SETFL, flags));
> assert(!fcntl(1, F_SETSIG, SIGIO));
> assert(!fcntl(1, F_SETOWN, getpid()));
>
> for (;;) {
> if (state == 0) {
> res = write(1, msg, sizeof(msg));
> if (res < 0 && errno == EAGAIN)
> state = 1;
> } else if (state == 1) {
> pause();
> } else if (state == 2) {
> res = epoll_wait(epollfd, epoll_events, MAX_EPOLL_EVENTS, 1);
> assert(res == 1);
> state = 0;
> }
> }
> }
> [/snip]
>
> strace -e epoll_wait shows, the last few lines:
>
> --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
> epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1
> --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
> epoll_wait(3, [{events=EPOLLOUT, data={u32=0, u64=0}}], 64, 1) = 1
> --- SIGIO {si_signo=SIGIO, si_code=POLL_OUT, si_band=772} ---
> epoll_wait(3, [], 64, 1) = 0
> --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2033614,
> si_uid=1000} ---
> +++ killed by SIGABRT (core dumped) +++
>
> I reproduced this assertion failure on:
> * a self-built 5.11.14 SMP vanilla kernel running on bare metal, with
> terminal as a graphical application.
> * a 4.19.0-0.bpo.14-amd64 Debian 4.19.171-2~deb9u1 SMP kernel with
> terminal as ssh terminal.
>
> Can still reproduce after setting timeout to 1 ms, in both in UML and
> in this minimal example.
>
> I looked into how SIGIO and epoll events are generated for a tty, and
> it seems that SIGIO is generated strictly before epoll events are
> generated. gdb backtrace looks like (yes, on UML; didn't bother to
> spin up qemu since I had a UML gdb open):
>
> #0 send_sigio (fown=0x674a4b98, fd=0x1, band=0x2) at ./fs/fcntl.c:795
> #1 kill_fasync_rcu (band=<optimized out>, sig=<optimized out>,
> fa=0x672f90c0) at ./fs/fcntl.c:1019
> #2 kill_fasync (fp=<optimized out>, sig=0x1d, band=0x2) at ./fs/fcntl.c:1033
> #3 n_tty_write_wakeup (tty=<optimized out>) at ./drivers/tty/n_tty.c:243
> #4 tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:533
> #5 tty_port_default_wakeup (port=<optimized out>) at
> ./drivers/tty/tty_port.c:50
> #6 tty_port_tty_wakeup (port=0x615ded80 <vts>) at
> ./drivers/tty/tty_port.c:388
>
> #0 ep_poll_callback (wait=0x674d2ad0, mode=0x1, sync=0x0, key=0x4)
> at ./fs/eventpoll.c:361
> #1 __wake_up_common (wq_head=0x67485ce8, mode=0x1,
> nr_exclusive=0x1, wake_flags=0x0, key=0x4, bookmark=0x61396f70) at
> ./kernel/sched/wait.c:108
> #2 __wake_up_common_lock (wq_head=0x67485ce8, mode=0x1,
> nr_exclusive=0x1, wake_flags=0x0, key=0x4) at
> ./kernel/sched/wait.c:138
> #3 __wake_up (wq_head=0x67485ce8, mode=0x1, nr_exclusive=0x1,
> key=0x4) at ./kernel/sched/wait.c:157
> #4 tty_wakeup (tty=0x67485800) at ./drivers/tty/tty_io.c:537
> #5 tty_port_default_wakeup (port=<optimized out>) at
> ./drivers/tty/tty_port.c:50
> #6 tty_port_tty_wakeup (port=0x615ded80 <vts>) at
> ./drivers/tty/tty_port.c:388
>
> The last shared frame is in tty_wakeup, and n_tty_write_wakeup is
> called by `ld->ops->write_wakeup` whereas __wake_up is in the macro
> wake_up_interruptible_poll, so send_sigio is called strictly before
> ep_poll_callback for tty_wakeup.
>
> I have not tested on a non-SMP host, but on an SMP host, SIGIO could
> cause the waiting task to wake up on a different core, and the task
> could invoke epoll_wait before the interrupt handler generates the
> epoll event in ep_poll_callback. If this happens to UML, the interrupt
> for UML to make the tty ready will be missed until the next SIGIO.
>
> Considering that this is a race on the host, what would be the best
> way to fix this?
Interesting one. I need to think.
One option would be to wait for epoll events with a timeout which is larger than zero - f.e. HZ.
If we have received a SIGIO there is an epoll event on the way. The fact that it is not in the queue right now means that we are due to process it shortly.
A.
>
> YiFei Zhu
>
> _______________________________________________
> linux-um mailing list
> linux-um at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
>
--
Anton R. Ivanov
https://www.kot-begemot.co.uk/
More information about the linux-um
mailing list