[PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts

Fri Nov 24 06:53:17 PST 2023

Hi,

> Sorry for my delayed response to your detailed email. I find it quite 
> hard to discuss such a complex topic via mailing lists without it 
> sounding impolite.

Heh, no worries about either :)

> Maybe also as some basis for my reasoning: I'm quite familiar with 
> discrete event-based simulation, with a special focus on SystemC 
> simulation.

Yeah OK, fair. I've only tangentially heard about SystemC :)

> There are
> some common (old) principles of DES that map perfectly to the time 
> travel mode, so my mental model has always been shaped around those 
> constraints.
> - Every instance/object/element in a DES has some inputs/signals that 
> activate it either at the current moment or at some later point in
time.
> - Every instance/object/element in a DES will run, starting from its 
> activation time until it has "finished", without advancing the global 
> simulation time.
> - Events (or activations) occurring at the exact same point in 
> simulation time would happen in parallel. Therefore, the actual order
of 
> execution in a
> sequential simulator is more or less unimportant (though some 
> implementations may require a deterministic order, ensuring
simulations 
> with the same
> parameters yield the exact same output, e.g., SystemC). The parallel 
> execution of events at the same time in the simulation is an 
> optimization that
> may be implemented, but is not always utilized.

Sure.

> After reviewing all your analyses, I believe the most significant 
> difference in my implementation lies in the last point. I did not 
> enforce the order
> of message processing when they occur exactly at the same simulation 
> time. 

I'm not sure I fully agree with this - yes there's an ordering aspect to
it wrt. processing events that happen at the same simulation time, but
I'm not even sure that's the worst part of what you saw. Given the use
of SIGIO, the harder part is that you don't even have a guarantee that
an event is processed _at all_ at the same simulation time it was sent.
It might get processed later, at a different simulation time.

> Consequently, I modified my implementation to eliminate the 
> synchronous
> (and, to be honest, quite hacky) read operation with special handling
on 
> the timetravel socket. Instead, I implemented a central epoll routine,
which
> is called by my master simulation kernel (NS3). My rationale was that
if 
> I haven't received the request from the TT-protocol, I cannot advance
time.

Yes, but even then I believe that a SIGIO event could be delayed to be
processed at a later simulation time than it should be.

> In conjunction with running not a single UML instance but many (my 
> current use case consists of a 10-node pair == 20 UML nodes), this can
> create all
> sorts of race/deadlock conditions, which we have identified.

FWIW, I'm typically running 5-6 nodes with a simulated WiFi device each,
so the scale isn't that much different.

> For me, the ACK of vhost/virtio seemed somewhat redundant, as it 
> provides the same information as the TT-protocol, 

Actually, no, I don't agree that it provides the same information. The
ACK provides - to the sender of the event - the acknowledgement that the
event was actually seen. However, I will say that it's actually more a
consequence of UML using SIGIO than a fundamental part of the
simulation. If you were actually able to say "UML can do nothing upon
exiting idle state without checking for events first", you'd probably be
correct. However, that's an incorrect assumption given how UML's
interrupt model is implemented now.

So yes, we could remove the SIGIO model from UML when time-travel=ext or
=infcpu is in use, but that'd be a much more invasive rework.

However, if we'd actually do such a rework, then I believe you'd be
_almost_ correct with that sentence. The only other remaining lack of
information would be an updated free-until value, though you could
assume that the receiver of the event always wants to process it
immediately and update free-until=now from the sender of the event.

> assuming my device simulation
> resides within the scheduler.

I'm not sure that really makes a big difference. You could always set it
up in a way that the sender of the event causes free-until=now and
returns to the scheduler, though the scheduler would then have to
actually ask everyone for their current request [1] if it doesn't have
information about who an event was sent to.

[1] and each participant in the simulation would have to check for
pending events before answering such a question

> I must admit my assumption was incorrect, 
> primarily because the implementation of the TT-protocol in the kernel
is
> somewhat fragile and, most importantly, your requirement that 
> ***nothing***is allowed to interfere (creating SIGIOs) at certain
states 
> of a time-traveling
> UML instance is not well-documented. 

I guess I don't agree with it being fragile, but that's OK :)

You make it sound like that's an inherent fault of the protocol, when
really it's a consequence of the distributed scheduling model, IMHO.

And I think maybe this is also where SystemC-based intuition fails,
because things are distributed here.

> We need the ACK not for an 
> acknowledgment of registering the interrupt but to know that we are 
> allowed to send the
> next TT-msg. This very tight coupling of these two protocols does not 
> appear to be the best design or, at the very least, is poorly
documented.

I think you're mixing up the design and the implementation a bit. We do
need the ACK for saying that the interrupt was actually registered,
because otherwise - given SIGIO - we have no chance to know it was
processed synchronously. But I think I've described the possible
variants of the model above already.

If you didn't use SIGIO and processed events and TT-msgs in the same
mainloop, you'd not have to worry about the states, and that's true of
all of my device implementations, see e.g. network in my scheduler. But
that still doesn't get rid of the requirement for ACKs given how we want
to have a free-until optimisation (and thus a form of distributed
scheduling.)

> The prohibition of interference in certain TT-states also led to my 
> second mistake. I relaxed my second DES requirement and allowed 
> interrupts when the UML instance is in the RUN-state. 

I tend to think this was also another bug though, as I described in my
other email this can lead to events not even registering at the same
simulation time they were sent at, due to the async nature of SIGIO and
lack of polling when exiting WAIT state.

> This decision was based on the 
> impression that UML was built to work this way without TT, so why
should it
> break when in TT-Mode (which you proved was wrong). Whether this is 
> semantically reasonable or not can be questioned, but it triggered
technical
> problems with the current implementation.
> 
> With this realization, I tend to agree that maybe the whole patches to
> ensure thread-safe (or reentrant-safe access) of the event list might
be 
> dropped.

OK.

> Still, we should ensure that the SIGIO is simply processed
synchronously 
> in the idle loop.

I think you mean something else, but as written, this is wrong. You
cannot "process SIGIO", but you can (otherwise) notice (poll for) the
event that caused SIGIO, or even get rid of the SIGIO entirely.

So I think you mean "all events are processed synchronously in or on
exit from the idle loop", and then I tend to agree, but that's a pretty
big departure from how interrupt processing works in UML normally.

> This aligns with my last DES constraint: since everything
> happens at the same moment in simulation time, we do not need "real" 
> interrupts but can process interrupts (SIGIOs) later (but at the same 
> simulation time).

(again, not SIGIO, but events on FDs)

Well, it's a trade-off. Yes we could do this, but the only advantage is
getting rid of the ACK messages, which are also needed for the
distributed scheduling model.

In a SystemC model I believe everything happens in a single process, so
sending an event basically can influence the scheduler already? Or
perhaps "scheduling" in that model doesn't actually just consist of
picking the next task from an existing list, but having each task check
for events first?

Either way, I'm not sure this works in the distributed model here.

> I think this approach only works in ext or cpu-inf mode and may be 
> problematic in "normal" timetravel mode.

Yes, for sure, 'normal' time-travel is not suitable for this kind of
simulation, and vice versa.

> I might even consider dropping  the signal handler,
> which marks the interrupts pending, and process the signals with a 
> signalfd, but that is, again, only an optimization.

No, if you want to get away from the ACK requirement, you _have_ to drop
the signal handler, or at least make it do nothing, and understand that
events were queued in a different way, synchronously. The async nature
of the signal handlers causes all the problems we've been discussing -
unless, as I have done, you always wait for them to finish first with
the ACK requirement.

> Additionally, to address the interrupt acknowledgment for the serial 
> line, I'd like to propose this: why not add an extra file descriptor
in the
> command line, which is something the kernel could write to, such as an
> eventfd or a pipe, to signal the acknowledgment of the interrupt. For 
> example, the
> command line changes to ssl0=fd:0,fd:1,fd:3. If somebody uses the
serial 
> line driver with timetravel mode but without that acknowledgment fd,
we 
> can emit a warning or an error.

Sure, that works. But above you were arguing against the ACKs and now
you want to implement them? Which is it? ;-)

> I believe all these changes should work well with the shared memory 
> optimization and should make the entire time travel ext protocol a bit
> more robust, easier to use, and harder to misuse. ;-)

I'm not sure which changes really are left?

Adding the ACKs to the serial port, sure, that makes sense to me.

Getting rid of the ACKs overall, dunno, maybe it's possible but it
requires a different scheduling model, where _any_ external event makes
the sender (which was running) give up its free-until without knowing
that was needed [2], and then the scheduler has to ask everyone to
process pending events before actually picking the next task to run.
This was a bit implicitly done by your scheduling request message, so
that at that point your UML had time to process events. I think it was
still racy though since it was still SIGIO based, which has no ordering
guarantee vs. the read from the TT socket.

[2] though you argue below it's always needed

> However, even after the lengthy discussion on "when" interrupts should
> be processed, I still hold the opinion that the response to raising an
> interrupt
> should be immediate to the device simulation and not delayed in 
> simulation time. This only makes the device simulation harder without 
> real benefit. If
> you want to delay the interrupt handling (ISR and so on), that's still
> possible and in both situations highly dependent on the UML 
> implementation. If
> we want to add an interrupt delay, we need to implement something in
UML 
> anyway. If you want to delay it in the device driver, you can also
always do
> it, but you are not at the mercy of some hard-to-determine extra delay
> from UML.

I think this is really completely orthogonal to the other discussion
apart from what I wrote at [2] above, but I don't necessarily think that
it always makes sense to process all events immediately, or to even
schedule for that. It doesn't even make the implementation harder to
request the time slot for handling the event at now + epsilon instead of
now. But anyway, no need to get into this.

> Overall, if you think that makes sense, I could start on some patches,
> or perhaps you feel more comfortable doing that.

I'm honestly not sure what you had in mind now, apart from the ack-fd on
serial?

johannes