[PATCH RFC 10/11] um: Delay timer_read only in possible busy loops in TT-mode

Johannes Berg johannes at sipsolutions.net
Fri Nov 10 09:29:42 PST 2023


On Fri, 2023-11-10 at 16:54 +0100, Benjamin Beichler wrote:
> Am 06.11.2023 um 21:51 schrieb Johannes Berg:
> > On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> > > This slows down external TT-mode as more simulation roundtrips are
> > > required, and it unnecessarily affects the determinism and accuracy of
> > > the simulation.
> > I still don't think this is really true, it doesn't really affect
> > determinism? It makes it ... different, sure, but not non-deterministic?
> I intentionally kept it vague, but what I meant is that it's 
> unnecessarily challenging to determine.

Yeah, ok, fair enough.

> Perhaps I should mention that I'm running an unmodified Ubuntu rootfs 
> with systemd, which starts many daemons and other processes.

That sounds like a bit of a nightmare, to be honest, wouldn't you want
to keep things under tighter control? But I guess it really depends on
what you're trying to achieve.

> To me, it seems illogical to delay everything just because one process 
> is waiting for a timestamp.

Yeah I guess we'll just have to disagree ;-) You're running some
process, so you've kind of decided to "give it time" of sorts, and in a
normal system reading time will always take time, just like everything
else :)

But anyway, I'm not really opposed to this patch, it's just ... not
great, I guess? And like I said, makes more sense to squash 9 and 10?

> At the moment, we haven't patched the random device that fetches random 
> bytes from the host (do you already have a patch for this?),
> so complete repeatability isn't guaranteed at the moment. However, that 
> could be a logical next step.
> > > +static const int suspicious_busy_loop_syscalls[] = {
> > > +     36, //sys_getitimer
> > > +     96, //sys_gettimeofday
> > > +     201, //sys_time
> > > +     224, //sys_timer_gettime
> > > +     228, //sys_clock_gettime
> > > +     287, //sys_timerfd_gettime
> > > +};
> > That's kind of awful. Surely we can use __NR_timer_gettime etc. here at
> > least?
> Actually, this was a quick attempt to address the issue, and during that 
> period, I couldn't locate the appropriate macros.
> 
> These numbers are generated from arch/x86/entry/syscalls/syscall_64.tbl 
> (or 32 if configured in that manner).
> 
> I might be overlooking something, but it seems that __NR_timer_gettime 
> isn't defined in the kernel. If you have a better reference for this 
> translation, I'd appreciate it.

Look at the arch/x86/include/generated/uapi/asm/unistd*.h files after
you build the tree. How do they actually get generated? Beats me.

> > > +static bool suspicious_busy_loop(void)
> > > +{
> > > +     int i;
> > > +     int syscall = syscall_get_nr(current, task_pt_regs(current));
> > > +
> > > +     for (i = 0; i < ARRAY_SIZE(suspicious_busy_loop_syscalls); i++) {
> > > +             if (suspicious_busy_loop_syscalls[i] == syscall)
> > > +                     return true;
> > > +     }
> > Might also be faster to have a bitmap? But ... also kind of awkward I
> > guess.
> Actually, a short fixed size array should be optimized quite well with 
> loop unrolling or other stuff, isn't it? I could also do a switch with 
> all calls, but this loop seems for me the easiest.

Yeah, maybe. I haven't checked the output.

> > I dunno. I'm not even sure what you're trying to achieve - apart from
> > "determinism" which seems odd or even wrong, and speed, which is
> > probably easier done with a better free-until and the shared memory
> > calendar we have been working on.
> In my perspective, delaying get_timer only serves as a tie-breaker for 
> poorly behaving software that resorts to a busy-loop while waiting for 
> time to advance.

Yeah I guess that's true.

> While this behavior might not be uncommon, why penalize all processes 
> for it?

Well I think we have a different sense of "penalize", I wouldn't say
that. I mean, you can't reasonably expect getting a timestamp doesn't
take any time at all, that's just not how physical reality works? Now
we're bending the rules here in that a lot of things that normally take
time suddenly don't, but I guess I don't fully understand why you're so
keen on bending the rules _all the way_.

But I think that's this:

> Consider an experiment where I aim to measure the impact of network 
> latency on software. Sometimes, the response latency fluctuates
> because a background task was scheduled randomly but deterministically 

"randomly but deterministically" is kind of fun 😂️

> at the same time, obtaining a timestamp that blocks all other
> processes and advances simulation time. This action simply undermines 
> the utility of the time travel mode unnecessarily.
> 
> However, software actively waiting for time advancement in a busy loop 
> achieves its goal. It’s almost a win-win situation, isn't it?

Fair enough.

johannes



More information about the linux-um mailing list