[PATCH v2] arm64/mpam: Clean MBWU monitor overflow bit
Zeng Heng
zengheng4 at huawei.com
Tue Nov 4 05:48:47 PST 2025
Hi Ben,
On 2025/11/4 18:24, Ben Horgan wrote:
> Hi Zeng,
>
> On 11/3/25 03:47, Zeng Heng wrote:
>> Hi Ben,
>>
>> On 2025/10/30 17:52, Ben Horgan wrote:
>>> Hi Zeng,
>>>
>>> On 10/29/25 07:56, Zeng Heng wrote:
>>>> The MSMON_MBWU register accumulates counts monotonically forward and
>>>> would not automatically cleared to zero on overflow. The overflow
>>>> portion
>>>> is exactly what mpam_msmon_overflow_val() computes, there is no need to
>>>> additionally subtract mbwu_state->prev_val.
>>>>
>>>> Before invoking write_msmon_ctl_flt_vals(), the overflow bit of the
>>>> MSMON_MBWU register must first be read to prevent it from being
>>>> inadvertently cleared by the write operation.
>>>>
>>>> Finally, use the overflow bit instead of relying on counter wrap-around
>>>> to determine whether an overflow has occurred, that avoids the case
>>>> where
>>>> a wrap-around (now > prev_val) is overlooked. So with this, prev_val no
>>>> longer has any use and remove it.
>>>>
>>>> CC: Ben Horgan <ben.horgan at arm.com>
>>>> Signed-off-by: Zeng Heng <zengheng4 at huawei.com>
>>>> ---
>>>> drivers/resctrl/mpam_devices.c | 22 +++++++++++++++++-----
>>>> drivers/resctrl/mpam_internal.h | 3 ---
>>>> 2 files changed, 17 insertions(+), 8 deletions(-)
>>>
>>> This all looks fine for overflow, but what we've been forgetting about
>>> is the power management. As James mentioned in his commit message, the
>>> prev_val is after now check is doing double duty. If an msc is powered
>>> down and reset then we lose the count. Hence, to keep an accurate count,
>>> we should be considering this case too.
>>>
>>
>>
>> Regarding CPU power management and CPU on-/off-line scenarios, this
>> should and already has been handled by mpam_save_mbwu_state():
>>
>> 1. Freezes the current MSMON_MBWU counter into the
>> mbwu_state->correction;
>> 2. Clears the MSMON_MBWU counter;
>>
>> After the CPU is powered back on, the total bandwidth traffic is
>> MSMON_MBWU(the `now` variable) + correction.
>>
>> So the above solution also covers CPU power-down scenarios, and no
>> additional code is needed to adapt to this case.
>>
>> If I've missed anything, thanks in advance to point it out.
>>
>
> No, I don't think you missed anything. You just didn't mention in your commit message
> that this is also fixing the power management case.
>
> I'm going to post the next version of this series for James as he is otherwise engaged.
> I've taken your patch and adapted it to fit in with the order of patches.
> Does this look ok to you? The support for the long counters will be added later.
>
Yes, I have reviewed the patch, and the related adaptations look good to
me.
> @@ -1016,6 +1025,9 @@ static void __ris_msmon_read(void *arg)
> if (config_mismatch) {
> write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
> overflow = false;
> + } else if (overflow) {
> + mpam_write_monsel_reg(msc, CFG_MBWU_CTL,
> + cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS);
> }
Yes, the clear register operation is added here.
Best Regards,
Zeng Heng
More information about the linux-arm-kernel
mailing list