[PATCH v2 0/8] Cache coherency management subsystem

Thu Jul 10 11:55:33 PDT 2025

On July 10, 2025 11:45:40 AM PDT, dan.j.williams at intel.com wrote:
>Peter Zijlstra wrote:
>> On Wed, Jul 09, 2025 at 10:22:40PM -0700, dan.j.williams at intel.com wrote:
>> 
>> > "Regular?", no. Something is wrong if you are doing this regularly. In
>> > current CXL systems the expectation is to suffer a WBINVD event once per
>> > server provisioning event.
>> 
>> Ok, so how about we strictly track this once, and when it happens more
>> than this once, we error out hard?
>> 
>> > Now, there is a nascent capability called "Dynamic Capacity Devices"
>> > (DCD) where the CXL configuration is able to change at runtime with
>> > multiple hosts sharing a pool of memory. Each time the physical memory
>> > capacity changes, cache management is needed.
>> > 
>> > For DCD, I think the negative effects of WBINVD are a *useful* stick to
>> > move device vendors to stop relying on software to solve this problem.
>> > They can implement an existing CXL protocol where the device tells CPUs
>> > and other CXL.cache agents to invalidate the physical address ranges
>> > that the device owns.
>> > 
>> > In other words, if WBINVD makes DCD inviable that is a useful outcome
>> > because it motivates unburdening Linux long term with this problem.
>> 
>> Per the above, I suggest we not support this feature *AT*ALL* until an
>> alternative to WBINVD is provided.
>> 
>> > In the near term though, current CXL platforms that do not support
>> > device-initiated-invalidate still need coarse cache management for that
>> > original infrequent provisioning events. Folks that want to go further
>> > and attempt frequent DCD events with WBINVD get to keep all the pieces.
>> 
>> I would strongly prefer those pieces to include WARNs and or worse.
>
>That is fair. It is not productive for the CXL subsystem to sit back and
>hope that people notice the destructive side-effects of wbinvd and hope
>that leads to device changes.
>
>This discussion has me reconsidering that yes, it would indeed be better
>to clflushopt loop over potentially terabytes on all CPUs. That should
>only be suffered rarely for the provisioning case, and for the DCD case
>the potential add/remove events should be more manageable.
>
>drm already has drm_clflush_pages() for bulk cache management, CXL
>should just align on that approach.

Let's not be flippant; looping over terabytes could take *hours*. But those are hours during which the system is alive, and only one CPU needs to be looping.

The other question is: what happens if memory is unplugged and then a cache line evicted? I'm guessing that existing memory hotplug solutions simply drop the writeback, since the OS knows there is no valid memory there, and so any cached data is inherently worthless.