[PATCH v2] um: time-travel: fix time corruption

Johannes Berg johannes at sipsolutions.net
Fri Oct 27 08:54:29 PDT 2023


On Fri, 2023-10-27 at 17:32 +0200, Benjamin Beichler wrote:
> Am 27.10.2023 um 16:45 schrieb Johannes Berg:
> > On Fri, 2023-10-27 at 16:05 +0200, Benjamin Beichler wrote:
> > > - besides this, when you look into local_irq_save for um, this does 
> > > not really deactivate the interrupts, but delay the processing and it 
> > > adds a memory barrier. I think that could be one of the important 
> > > consequences as changes to the event list are forced to be populated. 
> > > But this is only a guess :-D 
> > No, it *does* disable interrupts from Linux's POV. The memory barrier 
> > is just there to make _that_ actually visible "immediately". Yes it 
> > doesn't actually disable the *signal* underneath, but that doesn't 
> > really mean anything? Once you get the signal, nothing happens if it's 
> > disabled.
> Maybe I was a bit sloppy here. What I meant was, that a signal could 
> interrupt the code even the interrupts are disabled,

Yes, briefly, to find it shouldn't do anything. So nothing should happen
because of it?

> resulting into all the nice preemption consequences.

What do you mean?

> In real systems that may only do the NMI ?
> And especially the SIGIO handler keeps intentionally calling the time 
> travel handlers
> when the interrupts are disabled. The interrupt handlers of drivers are 
> called later, but the
> time travel handlers tend to change the event list.

Ah! Yes, that's true too ... hmm.

> I think I like the model too much, it fits so nicely in common DES 
> primitives. :-D

:)

> We have unlimited processing power (or every processing has zero 
> execution time),
> so why delaying a program which get once a while a timestamp from the 
> system.

Yeah, OK, fair enough.

Still, we _do_ need this to make progress at all in some cases, like the
python case described there somewhere.

> Moreover, since I want a really deterministic model, I anticipate that 
> if I send a msg
> at timestamp t1, my program should create an answer at t1 and not 
> sporadically at
> t1+delta, because the file system driver took a timestamp at a 
> background task.

Well at least it should grab the background task at the same time every
time ;-) But yeah that's only really true if you have all of the patches
we have to nail down random number generation ...

> > > In this case, irqs_disabled more or less only tend to indicate, that 
> > > the event list could be manipulated, and therefore the update_time 
> > > call is a bad idea. 
> > Which is kind of what I was thinking about earlier, but Vincent says 
> > it's not _just_ for that.
> Mhh, I think Vincent only wondered about whether the recursion statement 
> was right.

No, I meant the fact that he reported that changing this to always delay
was breaking it.

> > We probably should just add a spinlock for this though - right now 
> > none of this is good for eventual SMP support, and it's much easier to 
> > reason about when we have a lock?
> Mhh I think that would make sense. As I said, we also had problems in 
> ext-mode, which
> disappeared, after I introduce local_irq_save(flags); around all 
> operation, that modified
> the event list. I think a spinlock would show clearer the intention. For 
> SMP we may
> need some fine grain scheme (or RCU-based locking), to prevent 
> deadlocks, but I'm again
> not sure.
> 

RCU isn't really locking, I don't see how that'd work in the face of
changing the list. :)

But a spinlock should work, we might need to also disable IRQs though.

Well anyway, I don't think I'll do any work here before you've also
posted your patches.

And maybe we should apply all the patches on the list now since they
make things better, and then reconsider the model more carefully with
the next set of changes?

johannes



More information about the linux-um mailing list