[PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts

Mon Nov 13 13:22:40 PST 2023

On Mon, 2023-11-13 at 12:46 +0100, Benjamin Beichler wrote:
> > So I don't really agree that baking a "must be scheduled immediately"
> > into any protocol really makes sense.

> I think you are mixing here some levels of abstraction. Some entity in 
> the "system" needs to react to an interrupt immediately.

Don't think I agree, that's not really necessary.

> In the real world, this is mostly 
> done by some kind of interrupt controller
> which may latch the interrupt (if they are deactivated from CPU side). 

Even that itself may just be polling some lines, theoretically.

But that's mostly theoretical - if you wanted to simulate some interrupt
latency, you'd simply not react 'immediately' but rather some amount of
time later; the interrupt controller and all that is basically all the
same piece of code.

> Of course some more sophisticated
> bus protocols (i.e. virtio/vhost in this case) can also signal the 
> device to do no interrupts, but in the most
> basic case, e.g. a button at a GPIO pin, the device could not handle 
> that. 

Not sure I follow - how's that related to the protocol? Either you get
an interrupt or you don't, but once you do get it, there's also nothing
that says you will handle it immediately.

> Most of the interrupt abstraction in um is realized via signals,

Which is actually unnecessary for this kind of simulation since the UML
instance must be in idle by definition for someone else to be running
and sending it 'real' interrupts. We could as well just directly build
an epoll loop there, but that's more complicated surgery in the innards
of UML's interrupt model that isn't really needed.

Without time-travel, it _is_ needed since you want to have 'real'
interrupts that actually stop a process from running and you get into
the device driver and all.

> but some more seems (IMHO) to be needed 
> from the tt-protocol, if we have external
> device simulations. 

Well, I just said above _less_ is actually needed ;-)

The _more_ that's needed is to actually ensure the "only one thing is
running" part, which I've built into the system via acknowledgement (or
perhaps requirement for those).

> My suggestion was to add the virtual interrupt line 
> state change as a message to the calendar.

Yes but it doesn't work unless the other side _already_ knows that it
will happen, because it broke the rule of "only one thing is running".

Again you can argue that it's fine to return to the calendar even if the
receiving process won't actually do anything, but because the receiving
process is still thinking about it, you end up with all the contortions
you have to do in patch 4 and 7 ... because even the _calendar_ doesn't
know when the request will actually come to it.

Perhaps one way of handling this that doesn't require all those
contortions would be for the sender of the event to actually
_completely_ handle the calendar request for the interrupt on behalf of
the receiver, so that the receiver doesn't actually do _anything_ but
mark "this IRQ is pending" in this case. Once it actually gets a RUN
message it will actually start running since it assumes that it will not
get a RUN message without having requested a calendar entry. If the
calendar entry were already handle on its behalf, you'd not need the
request and therefore not need the special handling for the request from
patch 4.
You'd need a different implementation of patches 2/3, and get rid of
patches 4, 6, 7 and 8.

So ... thinking about that more, the issue isn't so much that you're
making an assumption that the interrupt should happen right away, it's
that only some parts of your system are making that assumption (the
sender of the event that causes the interrupt), and then you're leaving
the rest of the system to catch up with something that actually gets
processed _asynchronously_, causing all kinds of trouble.

> Of course, you could argue, that the device simulation needs to be 
> clever enough (so distributing this task of the
> interrupt controller), but even therefore the device simulation need 
> some more information to reflect delayed interrupts,
> deactivated interrupts and so on.
> 
> What confuses me is, that you explicitly use the immediate synchronous 
> response model on the interrupts from the vhost/virtio
> driver, but you seem to refuse this on other drivers.

I didn't mean to _refuse_ it, and actually you'll note that I made the
usfstl device abstraction able to trivially implement interrupt latency
by setting a single parameter (named "interrupt_latency" :-) ), so it's
been something that was definitely on my mind, but so far no testing
scenario has needed it, and the simulation runs faster without it ;-)

So no, I'm not actually trying to tell you that you must implement
interrupt latency. Sorry I came across that way.

I was just trying to use the interrupt latency as a reasoning for why
the recipient should be in control, but now having thought about it
more, I don't even think the recipient _needs_ to be in control, just
that one side needs to be _fully_ in control, and the model you've built
has both sides in control and racing against each other (and in the case
of UML even with itself internally processing the signal.)

> > > Why couldn't we be more
> > > explicit in the time travel protocol to the calendar about interrupts?
> > > We could use the ID from the start-msg to identify um instances and
> > > tell, in an additional msg, that we are going to trigger an interrupt at
> > > the current time to that instance from the device simulation.
> > But see above - I basically just don't think the sender of the event is
> > the right position to know that information.

Yeah you know what, I partially retract that statement. I still think
kind of the right way to think about it is to have the recipient in
control, but there are indeed things like the line interrupts where
perhaps it's better to put the sender in control. It's perhaps not
trivial since you might want to do a few writes? You'd have to combine
that into making a single calendar interrupt processing entry on behalf
of the receiver.

> Mhh if I apply that to the vhost/virtio simulations, that is, what is 
> implicitly modeled by the exchanged file descriptors and vhost/virtio
> protocol state.

Not sure I follow? The whole file descriptors and all are just
establishing communication mechanisms?

> We may change the whole tt-ext-mode to only accept vhost/virtio drivers, 
> but that needs to
> be stated explicitly in the documentation. I'm not really in favor of 
> that, but if it is documented
> it is easier to understand.

See above. I guess we don't have to, if we put a bit more requirement on
the sender.

> > Don't think I understand that. Which "facilities" are you referring to
> > here?
> I hadn't researched that on Friday, but we could use a custom control 
> msg (cmsg) on the unix sockets
> attached to the (serial) line devices. The device simulation can then 
> wait on that ancillary data for
> synchronous interrupt handling, as you have proposed for vhost/virtio. 
> My first idea was to use
> MSG_OOB, which do not exit on Unix-Domain-Sockets. If someone does not 
> use uds as transport
> we may give a warning.

Not sure that really works, what would you transfer in the other
direction? Or send an fd and then wait for it to get a message on that
fd or something?

johannes