[PATCH] um: add RCU syscall hack for time-travel

Richard Weinberger richard at nod.at
Fri Sep 13 05:32:52 PDT 2024


Hi!

----- Ursprüngliche Mail -----
> Von: "Benjamin Berg" <benjamin at sipsolutions.net>
> First, it doesn't seem like my patch actually works, so please do not
> merge it. It actually appears that tree RCU and tiny RCU (which are
> selected depending on the preemption setting) are behaving differently.
> 
> So now I am wondering if I can come up with a hack that works for both.

Ok!
 
> On Fri, 2024-09-13 at 13:47 +0200, Richard Weinberger wrote:
>> ----- Ursprüngliche Mail -----
>> > Von: "Benjamin Berg" <benjamin at sipsolutions.net>
>> > > While I acknowledge that time-travel itself is a beautiful hack, I'd
>> > > like to keep the hacks
>> > > to keep it working minimal.
>> > > So, the problem here is that RCU callbacks never run and just pile up?
>> > 
>> > Yes. A simple example of this is doing a "find /". This will allocate a
>> > lot of inode information which is only free'ed at a later point.
>> > 
>> > > I wonder why such a situation does not happen in a nohz_full setup on
>> > > regular systems.
>> > 
>> > Had to search for a bit. But, I think the boot CPU will still have a
>> > tick even on a NOHZ_FULL setup. see the nohz_full= boot parameter.
>> > 
>> > It does look like the RCU code might try to force scheduling (tiny RCU)
>> > or wake up a worker (tree RCU) in these situations. But neither of
>> > these attempts is going to fix the situation as there will be no call
>> > to rcu_sched_clock_irq with time-travel.
>> 
>> Agreed. I think having a house keeping CPU (thread) will not work in
>> time-travel mode.
>> Kicking RCU whenever a syscall is executed is okay, the question is,
>> are there other scenarios where RCU work can pile up and no syscall is
>> run for a long time? Maybe we need to kick it at other places (page fault
>> handler?)
>> too.
> 
> Hmm, that is good question. I assume that implies major faults for
> mapped files (or anonymous memory from swap) happening. I suppose, that
> can trigger just about anything in the kernel and could also create
> load on the RCU. Not sure how problematic that is, in our case it was
> python importing a large amount of files and bringing the system to its
> knees in the process.

I had also workloads like heavy network processing without userspace
interaction in mind.
 
> Anyway, I'll need to reconsider the hack a bit, maybe we can find a
> better solution.

We can also add RCU folks into the loop. But I guess they need a good
introduction first what time-traveling is. :-D

Thanks,
//richard



More information about the linux-um mailing list