[PATCH RFC 10/11] um: Delay timer_read only in possible busy loops in TT-mode

Fri Nov 10 07:54:02 PST 2023

Am 06.11.2023 um 21:51 schrieb Johannes Berg:
> On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
>> This slows down external TT-mode as more simulation roundtrips are
>> required, and it unnecessarily affects the determinism and accuracy of
>> the simulation.
> I still don't think this is really true, it doesn't really affect
> determinism? It makes it ... different, sure, but not non-deterministic?
I intentionally kept it vague, but what I meant is that it's 
unnecessarily challenging to determine.

Perhaps I should mention that I'm running an unmodified Ubuntu rootfs 
with systemd, which starts many daemons and other processes.

To me, it seems illogical to delay everything just because one process 
is waiting for a timestamp.

At the moment, we haven't patched the random device that fetches random 
bytes from the host (do you already have a patch for this?),
so complete repeatability isn't guaranteed at the moment. However, that 
could be a logical next step.
>> +static const int suspicious_busy_loop_syscalls[] = {
>> +     36, //sys_getitimer
>> +     96, //sys_gettimeofday
>> +     201, //sys_time
>> +     224, //sys_timer_gettime
>> +     228, //sys_clock_gettime
>> +     287, //sys_timerfd_gettime
>> +};
> That's kind of awful. Surely we can use __NR_timer_gettime etc. here at
> least?
Actually, this was a quick attempt to address the issue, and during that 
period, I couldn't locate the appropriate macros.

These numbers are generated from arch/x86/entry/syscalls/syscall_64.tbl 
(or 32 if configured in that manner).

I might be overlooking something, but it seems that __NR_timer_gettime 
isn't defined in the kernel. If you have a better reference for this 
translation, I'd appreciate it.

I could verify if the current syscall translates into the corresponding 
function symbol in the um-syscall table.
>> +static bool suspicious_busy_loop(void)
>> +{
>> +     int i;
>> +     int syscall = syscall_get_nr(current, task_pt_regs(current));
>> +
>> +     for (i = 0; i < ARRAY_SIZE(suspicious_busy_loop_syscalls); i++) {
>> +             if (suspicious_busy_loop_syscalls[i] == syscall)
>> +                     return true;
>> +     }
> Might also be faster to have a bitmap? But ... also kind of awkward I
> guess.
Actually, a short fixed size array should be optimized quite well with 
loop unrolling or other stuff, isn't it? I could also do a switch with 
all calls, but this loop seems for me the easiest.
> I dunno. I'm not even sure what you're trying to achieve - apart from
> "determinism" which seems odd or even wrong, and speed, which is
> probably easier done with a better free-until and the shared memory
> calendar we have been working on.
In my perspective, delaying get_timer only serves as a tie-breaker for 
poorly behaving software that resorts to a busy-loop while waiting for 
time to advance.

While this behavior might not be uncommon, why penalize all processes 
for it?

Consider an experiment where I aim to measure the impact of network 
latency on software. Sometimes, the response latency fluctuates
because a background task was scheduled randomly but deterministically 
at the same time, obtaining a timestamp that blocks all other
processes and advances simulation time. This action simply undermines 
the utility of the time travel mode unnecessarily.

However, software actively waiting for time advancement in a busy loop 
achieves its goal. It’s almost a win-win situation, isn't it?

Benjamin