[PATCH v3 29/47] arm_mpam: resctrl: Pick classes for use as mbm counters

Peter Newman peternewman at google.com
Mon Jan 19 04:47:52 PST 2026


Hi James,

On Mon, Jan 19, 2026 at 1:04 PM James Morse <james.morse at arm.com> wrote:
>
> Hi Peter,
>
> On 15/01/2026 15:49, Peter Newman wrote:
> > On Mon, Jan 12, 2026 at 6:02 PM Ben Horgan <ben.horgan at arm.com> wrote:
> >> From: James Morse <james.morse at arm.com>
> >>
> >> resctrl has two types of counters, NUMA-local and global. MPAM has only
> >> bandwidth counters, but the position of the MSC may mean it counts
> >> NUMA-local, or global traffic.
> >>
> >> But the topology information is not available.
> >>
> >> Apply a heuristic: the L2 or L3 supports bandwidth monitors, these are
> >> probably NUMA-local. If the memory controller supports bandwidth monitors,
> >> they are probably global.
>
> > Are remote memory accesses not cached? How do we know an MBWU monitor
> > residing on a cache won't count remote traffic?
>
> It will, yes you get double counting. Is forbidding both mbm_total and mbm_local preferable?
>
> I think this comes from 'total' in mbm_total not really having the obvious meaning of the
> word:
> If I have CPUs in NUMA-A and no memory controllers, then NUMA-B has no CPUs, and all the
> memory-controllers.
> With MPAM: we've only got one bandwidth counter, it doesn't know where the traffic goes
> after the MSC. mbm-local on the L3 would reflect all the bandwidth, and mbm-total on the
> memory-controllers would have the  same number.
> I think on x86 mbm_local on the CPUs would read zero as zero traffic went to the 'local'
> memory controller, and mbm_total would reflect all the memory bandwidth. (so 'total'
> really means 'other')

Our software is going off the definition from the Intel SDM:

"This event monitors the L3 external bandwidth satisfied by the local
memory. In most platforms that support this event, L3 requests are
likely serviced by a memory system with non-uniform memory
architecture. This allows bandwidth to off-package memory resources to
be tracked by subtracting local from total bandwidth (for instance,
bandwidth over QPI to a memory controller on another physical
processor could be tracked by subtraction).

On NUMA-capable hardware that can support this event where all memory
is local, mbm_local == mbm_total, but in practice you can't read them
at the same time from userspace, so if you read mbm_total first,
you'll probably get a small negative result for remote bandwidth.

>
> I think what MPAM is doing here is still useful as a system normally has both CPUs and
> memory controllers in the NUMA nodes, and you can use this to spot a control/monitor group
> on a NUMA-node that is hammering all the memory (outlier mbm_local), or the same where a
> NUMA-node's memory controller is getting hammered by all the NUMA nodes (outlier
> mbm_total)
>
> I've not heard of a platform with both memory bandwidth monitors at L3 and the memory
> controller, so this may be a theoretical issue.
>
> Shall we only expose one of mbm-local/total to prevent this being seen by user-space?

I believe in the current software design, MPAM is only able to support
mbm_total, as an individual MSC (or class of MSCs with the same
configuration) can't separate traffic by destination, so it must be
the combined value. On a hardware design where MSCs were placed such
that one only counts local traffic and another only counts remote, the
resctrl MPAM driver would have to understand the hardware
configuration well enough to be able to produce counts following
Intel's definition of mbm_local and mbm_total.

Thanks,
-Peter



More information about the linux-arm-kernel mailing list