[PATCH] um: insert scheduler ticks when userspace does not yield

Mon Sep 23 14:50:27 PDT 2024

Hi,

Am 23.09.2024 um 16:48 schrieb Benjamin Berg:
>> Actually, I think, timeouts are no problem, if we can assure, that a
>> timeout is never rounded down to 0. Mostly a direct input of 0 have
>> special meanings, or provokes wrong behavior in the first place from
>> user space program.
> I don't think that is a problem. The kernel should guarantee that a
> timeout never fires too early.
>
> I believe in the case of the linked python code, the timeout fires at
> exactly the correct time. And then the python code (incorrectly)
> detects that the timeout has not passed and tries to "select" again
> with a timeout of exactly zero.
>
> Really, that implementation is just buggy in subtle ways. It could
> probably just trust the kernel to not wake up early. And, if it does
> check whether the timeout has passed, then it should just accept the
> exact time.

Maybe I'm doing a captain obvious here, but I had the impression this 
code was written this way, to handle interruptions by signals and not to 
doubt the time accuracy. Possibly I'm totally wrong, but it seems quite 
elegant to simply use time here to avoid that dance to mask signals or 
check for interruptions etc.

I believe this code was written in mind that time() will advance, so 
this will never be an endless loop, so even the corner case that timeout 
was 0 would be covered by this.

>> Since time-travel mode has a very limited niche, I would not try to
>> prevent every possible dumb behavior that bad user space programs could
>> have. I think busy-waiting on a system clock advancement is not the best
>> style, but acceptable.
>>
>> So my list was:
>>
>> sys_getitimer
>> sys_gettimeofday
>> sys_time
>> sys_timer_gettime
>> sys_clock_gettime
>> sys_timerfd_gettime
>>
>> While overthinking it, I see the possibility to read the access
>> timestamps of a file to create an endless loop, so maybe the stat
>> syscalls may be included, although this makes me a bit uncomfortable
>> again. I tend to say, this "bad" behavior of asking the same information
>> over and over again, should only be punished, if it happens multiple times.
>>
>> I was thinking about, storing the PID of a busy-looped process, and only
>> increase time, if the same PID is "suspicious".  However, this "hack"
>> becomes more and more costly, which is on the other hand not important
>> for timetravel mode.
> Maybe a stupid question, but aren't we overthinking this in general?
>
> While I think that Johannes' solution to make reading the time cost
> time is kind of ingenious, I really wonder how much of an issue this
> actually is. Because if this is just a few userspace applications and
> libraries misbehaving, then we might as well fix the issue there
> instead of doing anything special in UML.

Your point is right, and such bugs may be fixed in user space. On the 
other hand, what about software we can't or don't want to fix, which in 
the wild simply works. For my future use cases, I will run code, that 
I'm not able to compile myself. I would even consider to have a runtime 
switch to change the behavior of this hack, to reduce the overhead in 
simulations that behave nicely, but have some quick workaround for 
misbehaving code.

And sorry for repeating myself, but I believe, that busy waiting on an 
increasing timer value is not the best style, but considered okay/normal 
for some use cases. So I think it would be helpful to be able to execute 
such user space code.

But I want to bring in another idea: Could we use an ebpf program to 
dynamically hook into syscalls and do a timetravel_update or something 
similar? Actually, I do not know whether ebpf works normally in UM, but 
that way it would be flexible and moving the dirty hacks into small 
portions outside the kernel. From what I understand, we would need to 
add an ebpf callable wrapper for the time travel update function, isn't it?

>
>>> One neat side effect is that if reading time does not actually cost
>>> time, then we could implement clock_gettime in the VDSO.
>> That would exactly not work, because of my comment from before.
> Of course. It is just that I have always in the back of my mind that
> syscalls and pagefaults (including minor faults) are really expensive
> in UML. So if the hack is moved elsewhere then implementing
> clock_gettime in the vDSO could be an easy win to speed up the
> simulation.

Mhh I did only a quick look into "arch/x86/um/vdso/um_vdso.c" and from 
my understanding, currently every vdso call is converted into syscalls 
of the host. So we need much more code to use here the time travel 
clock, isn't it? Of course, my proposed ebpf hook would not work here 
either...

>
> Benjamin
>
  kind regards

(the other) Benjamin