preempted dup_mm misses TLB invalidate

Wed Jul 17 16:34:09 EDT 2013

On Wed, Jul 17, 2013 at 01:09:52PM -0700, Nickolas Fortino wrote:
> The problem is eventually a user process performs a store which hits on  
> a writeable TLB entry with the PTE marked as read only. Is it supposed  
> to be possible for a user threading bug to end up in this state?

I've thought about that, and I'm not sure what we can do about this.
Moreover, I really don't think it matters at all.

Let's consider a SMP system running a multithreaded application.  CPUs
0 and 1 are running two threads, CPU 1 is about to do a fork, but CPU 0
is doing a large time consuming memcpy().

CPU 1 does the fork while CPU 0 is still running this large memcpy.  It
walks the page tables, setting the PTEs to read-only.  Let's say for
argument sake that it immediately invalidates each PTE after modification.

There is still a window which CPU0 can see the TLB entry, but the PTE has
already been write protected.  The only way to close this window is to
stop all threads of the process doing a fork().

However, before we think "oh, that sounds like a solution", let's think
about this a bit more first.

Let's say that we are on a system which doesn't need any TLB maintanence.
In other words, all PTE updates are seen by all observers immediately.

Consider the above scenario again.  What is the state of the memory at
the point the fork() returns, as seen from both the multithreaded parent
point of view and the child point of view?  Can you predict where in
that memcpy() CPU 0 will have been (and therefore what data the child can
see from that memcpy)?

The answer is you can't, because you don't know if CPU 0 might have had
an interrupt to deal with which stole time away from the memcpy().  You
don't know the relative timing of CPU 0's loads/stores against the time
it took CPU 1 to mark the PTE read-only.

Even if you stopped all threads on entry to a fork, the same problem
exists - at the point that you stopped the other threads, how do you know
what data they've written to memory?

What I'm pointing out here is that in this situation, the data visible to
the child process is unpredictable.

So, does it matter if a thread hits a page which has been marked read-only
in the PTE but hasn't been invalidated yet?  The answer to that is no -
because the parent and the child will see the update, and it will be
absolutely no different from what would have happened if the store had
happened _just before_ the PTE was marked read-only.

I'm pretty convinced that if you need to rely on a multi-threaded
programs state at the point you fork(), you must have some way to quiesce
your other threads _in user space_ rather than hoping that the kernel has
some magic to patch over this.