[PATCH RFC 10/11] um: Delay timer_read only in possible busy loops in TT-mode
Benjamin Beichler
Benjamin.Beichler at uni-rostock.de
Fri Nov 10 07:54:02 PST 2023
Am 06.11.2023 um 21:51 schrieb Johannes Berg:
> On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
>> This slows down external TT-mode as more simulation roundtrips are
>> required, and it unnecessarily affects the determinism and accuracy of
>> the simulation.
> I still don't think this is really true, it doesn't really affect
> determinism? It makes it ... different, sure, but not non-deterministic?
I intentionally kept it vague, but what I meant is that it's
unnecessarily challenging to determine.
Perhaps I should mention that I'm running an unmodified Ubuntu rootfs
with systemd, which starts many daemons and other processes.
To me, it seems illogical to delay everything just because one process
is waiting for a timestamp.
At the moment, we haven't patched the random device that fetches random
bytes from the host (do you already have a patch for this?),
so complete repeatability isn't guaranteed at the moment. However, that
could be a logical next step.
>> +static const int suspicious_busy_loop_syscalls[] = {
>> + 36, //sys_getitimer
>> + 96, //sys_gettimeofday
>> + 201, //sys_time
>> + 224, //sys_timer_gettime
>> + 228, //sys_clock_gettime
>> + 287, //sys_timerfd_gettime
>> +};
> That's kind of awful. Surely we can use __NR_timer_gettime etc. here at
> least?
Actually, this was a quick attempt to address the issue, and during that
period, I couldn't locate the appropriate macros.
These numbers are generated from arch/x86/entry/syscalls/syscall_64.tbl
(or 32 if configured in that manner).
I might be overlooking something, but it seems that __NR_timer_gettime
isn't defined in the kernel. If you have a better reference for this
translation, I'd appreciate it.
I could verify if the current syscall translates into the corresponding
function symbol in the um-syscall table.
>> +static bool suspicious_busy_loop(void)
>> +{
>> + int i;
>> + int syscall = syscall_get_nr(current, task_pt_regs(current));
>> +
>> + for (i = 0; i < ARRAY_SIZE(suspicious_busy_loop_syscalls); i++) {
>> + if (suspicious_busy_loop_syscalls[i] == syscall)
>> + return true;
>> + }
> Might also be faster to have a bitmap? But ... also kind of awkward I
> guess.
Actually, a short fixed size array should be optimized quite well with
loop unrolling or other stuff, isn't it? I could also do a switch with
all calls, but this loop seems for me the easiest.
> I dunno. I'm not even sure what you're trying to achieve - apart from
> "determinism" which seems odd or even wrong, and speed, which is
> probably easier done with a better free-until and the shared memory
> calendar we have been working on.
In my perspective, delaying get_timer only serves as a tie-breaker for
poorly behaving software that resorts to a busy-loop while waiting for
time to advance.
While this behavior might not be uncommon, why penalize all processes
for it?
Consider an experiment where I aim to measure the impact of network
latency on software. Sometimes, the response latency fluctuates
because a background task was scheduled randomly but deterministically
at the same time, obtaining a timestamp that blocks all other
processes and advances simulation time. This action simply undermines
the utility of the time travel mode unnecessarily.
However, software actively waiting for time advancement in a busy loop
achieves its goal. It’s almost a win-win situation, isn't it?
Benjamin
More information about the linux-um
mailing list