[PATCH v2 0/8] Cache coherency management subsystem

Wed Jul 9 22:31:11 PDT 2025

On July 9, 2025 10:22:40 PM PDT, dan.j.williams at intel.com wrote:
>Peter Zijlstra wrote:
>> On Wed, Jun 25, 2025 at 02:12:39AM -0700, H. Peter Anvin wrote:
>> > On June 25, 2025 1:52:04 AM PDT, Peter Zijlstra <peterz at infradead.org> wrote:
>> > >On Tue, Jun 24, 2025 at 04:47:56PM +0100, Jonathan Cameron wrote:
>> > >
>> > >> On x86 there is the much loved WBINVD instruction that causes a write back
>> > >> and invalidate of all caches in the system. It is expensive but it is
>> > >
>> > >Expensive is not the only problem. It actively interferes with things
>> > >like Cache-Allocation-Technology (RDT-CAT for the intel folks). Doing
>> > >WBINVD utterly destroys the cache subsystem for everybody on the
>> > >machine.
>> > >
>> > >> necessary in a few corner cases. 
>> > >
>> > >Don't we have things like CLFLUSH/CLFLUSHOPT/CLWB exactly so that we can
>> > >avoid doing dumb things like WBINVD ?!?
>> > >
>> > >> These are cases where the contents of
>> > >> Physical Memory may change without any writes from the host. Whilst there
>> > >> are a few reasons this might happen, the one I care about here is when
>> > >> we are adding or removing mappings on CXL. So typically going from
>> > >> there being actual memory at a host Physical Address to nothing there
>> > >> (reads as zero, writes dropped) or visa-versa. 
>> > >
>> > >> The
>> > >> thing that makes it very hard to handle with CPU flushes is that the
>> > >> instructions are normally VA based and not guaranteed to reach beyond
>> > >> the Point of Coherence or similar. You might be able to (ab)use
>> > >> various flush operations intended to ensure persistence memory but
>> > >> in general they don't work either.
>> > >
>> > >Urgh so this. Dan, Dave, are we getting new instructions to deal with
>> > >this? I'm really not keen on having WBINVD in active use.
>> > >
>> > 
>> > WBINVD is the nuclear weapon to use when you have lost all notion of
>> > where the problematic data can be, and amounts to a full reset of the
>> > cache system. 
>> > 
>> > WBINVD can block interrupts for many *milliseconds*, system wide, and
>> > so is really only useful for once-per-boot type events, like MTRR
>> > initialization.
>> 
>> Right this... But that CXL thing sounds like that's semi 'regular' to
>> the point that providing some infrastructure around it makes sense. This
>> should not be.
>
>"Regular?", no. Something is wrong if you are doing this regularly. In
>current CXL systems the expectation is to suffer a WBINVD event once per
>server provisioning event.
>
>Now, there is a nascent capability called "Dynamic Capacity Devices"
>(DCD) where the CXL configuration is able to change at runtime with
>multiple hosts sharing a pool of memory. Each time the physical memory
>capacity changes, cache management is needed.
>
>For DCD, I think the negative effects of WBINVD are a *useful* stick to
>move device vendors to stop relying on software to solve this problem.
>They can implement an existing CXL protocol where the device tells CPUs
>and other CXL.cache agents to invalidate the physical address ranges
>that the device owns.
>
>In other words, if WBINVD makes DCD inviable that is a useful outcome
>because it motivates unburdening Linux long term with this problem.
>
>In the near term though, current CXL platforms that do not support
>device-initiated-invalidate still need coarse cache management for that
>original infrequent provisioning events. Folks that want to go further
>and attempt frequent DCD events with WBINVD get to keep all the pieces.

Since this is presumably rare, it might be better to loop and clflush, even though it will take longer, rather than stopping the world.