[RFC PATCH v7 29/31] x86/mm/pti: Implement a TLB flush immediately after a switch to kernel CR3

Wed Nov 19 06:31:30 PST 2025

On Fri, Nov 14, 2025, at 7:14 AM, Valentin Schneider wrote:
> Deferring kernel range TLB flushes requires the guarantee that upon
> entering the kernel, no stale entry may be accessed. The simplest way to
> provide such a guarantee is to issue an unconditional flush upon switching
> to the kernel CR3, as this is the pivoting point where such stale entries
> may be accessed.
>

Doing this together with the PTI CR3 switch has no actual benefit: MOV CR3 doesn’t flush global pages. And doing this in asm is pretty gross.  We don’t even get a free sync_core() out of it because INVPCID is not documented as being serializing.

Why can’t we do it in C?  What’s the actual risk?  In order to trip over a stale TLB entry, we would need to deference a pointer to newly allocated kernel virtual memory that was not valid prior to our entry into user mode. I can imagine BPF doing this, but plain noinstr C in the entry path?  Especially noinstr C *that has RCU disabled*?  We already can’t follow an RCU pointer, and ISTM the only style of kernel code that might do this would use RCU to protect the pointer, and we are already doomed if we follow an RCU pointer to any sort of memory.

We do need to watch out for NMI/MCE hitting before we flush.