[PATCH] um: insert scheduler ticks when userspace does not yield

Mon Sep 23 07:48:19 PDT 2024

On Mon, 2024-09-23 at 15:56 +0200, Benjamin Beichler wrote:
> > > For "clean" semantics of a simulative execution of the kernel, it feels
> > > erroneous to advance time even if this value is only read once.
> > > 
> > > In my experiments timer_read was called much more often than I
> > > anticipated (e.g., filesystem code).
> > Yeah, that does not really sound like something we would want (and it
> > will also not help with performance with time-travel=ext).
> > 
> > Looking at the old discussion, it doesn't seem that Johannes was
> > against the idea of doing the time insertion only in more specific
> > scenarios. So, we "just" need a reasonably elegant solution.
> > 
> > If we accept writing a list of syscalls, then maybe we could just do it
> > within handle_syscall and do a um_udelay(1) for any syscall that takes
> > a timeout parameter (select, pselect6, poll, ...)? It is going to be a
> > pretty long list, but could still be reasonable.
> 
> That's actually not what my "hack" did. I filtered out all syscalls, 
> that give some information about the current timestamp of the system.

Yes, I know.

> Actually, I think, timeouts are no problem, if we can assure, that a 
> timeout is never rounded down to 0. Mostly a direct input of 0 have 
> special meanings, or provokes wrong behavior in the first place from 
> user space program.

I don't think that is a problem. The kernel should guarantee that a
timeout never fires too early.

I believe in the case of the linked python code, the timeout fires at
exactly the correct time. And then the python code (incorrectly)
detects that the timeout has not passed and tries to "select" again
with a timeout of exactly zero.

Really, that implementation is just buggy in subtle ways. It could
probably just trust the kernel to not wake up early. And, if it does
check whether the timeout has passed, then it should just accept the
exact time.

(Note that e.g. python asyncio explicitly takes into account the clock
resolution to avoid this type of issue.)

> Since time-travel mode has a very limited niche, I would not try to 
> prevent every possible dumb behavior that bad user space programs could 
> have. I think busy-waiting on a system clock advancement is not the best 
> style, but acceptable.
> 
> So my list was:
> 
> sys_getitimer
> sys_gettimeofday
> sys_time
> sys_timer_gettime
> sys_clock_gettime
> sys_timerfd_gettime
> 
> While overthinking it, I see the possibility to read the access 
> timestamps of a file to create an endless loop, so maybe the stat 
> syscalls may be included, although this makes me a bit uncomfortable 
> again. I tend to say, this "bad" behavior of asking the same information 
> over and over again, should only be punished, if it happens multiple times.
> 
> I was thinking about, storing the PID of a busy-looped process, and only 
> increase time, if the same PID is "suspicious".  However, this "hack"
> becomes more and more costly, which is on the other hand not important 
> for timetravel mode.

Maybe a stupid question, but aren't we overthinking this in general?

While I think that Johannes' solution to make reading the time cost
time is kind of ingenious, I really wonder how much of an issue this
actually is. Because if this is just a few userspace applications and
libraries misbehaving, then we might as well fix the issue there
instead of doing anything special in UML.

> > One neat side effect is that if reading time does not actually cost
> > time, then we could implement clock_gettime in the VDSO.
> 
> That would exactly not work, because of my comment from before.

Of course. It is just that I have always in the back of my mind that
syscalls and pagefaults (including minor faults) are really expensive
in UML. So if the hack is moved elsewhere then implementing
clock_gettime in the vDSO could be an easy win to speed up the
simulation.

Benjamin