[PATCH v2] um: time-travel: fix time corruption

Vincent Whitchurch Vincent.Whitchurch at axis.com
Thu Oct 26 01:49:21 PDT 2023


On Thu, 2023-10-26 at 09:38 +0200, Johannes Berg wrote:
> On Thu, 2023-10-26 at 07:23 +0000, Vincent Whitchurch wrote:
> > > @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs)
> > >  		 */
> > >  		if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
> > >  		    !time_travel_ext_waiting)
> > > -			time_travel_update_time(time_travel_time +
> > > -						TIMER_MULTIPLIER,
> > > -						false);
> > > +			time_travel_update_time_rel(TIMER_MULTIPLIER);
> > >  		return time_travel_time / TIMER_MULTIPLIER;
> > >  	}
> > 
> > The reason I hesitated with putting the whole of
> > time_travel_update_time() under local_irq_save() in my attempt was
> > because I didn't quite understand the reason for the !irqs_disabled()
> > condition here and the comment just above it about recursion and things
> > getting messed up.  If it's OK to disable interrupts as this patch does,
> > is the !irqs_disabled() condition valid?
> 
> Hmm. I was going to say that's different, because it wants to only
> prevent us from doing this while we're *already* in IRQ context, and the
> bug you found is calling timer_read() not in IRQ context, but getting an
> event queued by the signal.
> 
> But ... now that I think about it, I have a feeling that this was a
> workaround for the exact same problem, and I just didn't understand it
> at the time? I mean, recursing into our own processing is now impossible
> here after this patch - either we're running normally, or the interrupt
> cannot hit timer_read() in the middle, same as it cannot hit
> time_travel_handle_real_alarm() in the middle now.
> 
> Removing that still seems to work with your test, but it's also not a
> good test for this, since there are no devices etc. that could have
> interrupts, not sure how to test it right now?
> 
> Maybe I'll add a comment there saying this might no longer be needed?

I tried removing the !irqs_disabled() check and that blew up pretty
quickly (below) when running the full roadtest suite.  It works fine
with your unmodified patch so no need to change the comment.

 Kernel panic - not syncing: time-travel: time goes backwards 26374790000864 -> 26374790000853
 show_stack.cold (arch/um/kernel/sysrq.c:56) 
 dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
 dump_stack (lib/dump_stack.c:114) 
 panic (kernel/panic.c:262 kernel/panic.c:361) 
 timer_handler.cold (arch/um/kernel/time.c:51 arch/um/kernel/time.c:510 arch/um/kernel/time.c:634) 
 timer_real_alarm_handler (arch/um/os-Linux/signal.c:109) 
 unblock_signals (arch/um/os-Linux/signal.c:338) 
 tick_nohz_idle_exit (kernel/time/tick-sched.c:1364) 
 do_idle (kernel/sched/idle.c:310) 
 cpu_startup_entry (kernel/sched/idle.c:379 (discriminator 1)) 
 kernel_init (init/main.c:1435) 
 0x60001ce6 
 0x6000220e 
 0x60004961 
 new_thread_handler (arch/um/include/asm/thread_info.h:46 arch/um/kernel/process.c:136) 
 uml_finishsetup (arch/um/kernel/um_arch.c:268) 



More information about the linux-um mailing list