[PATCH v5 31/41] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid()

Zeng Heng zengheng4 at huawei.com
Mon Mar 9 20:23:23 PDT 2026


Hi Ben,

On 2026/3/10 0:30, Ben Horgan wrote:
> Hi Zeng,
> 
> On 3/7/26 09:29, Zeng Heng wrote:
>> Hi Ben,
>>
>> On 2026/2/25 1:57, Ben Horgan wrote:
>>> From: James Morse <james.morse at arm.com>
>>>
>>> resctrl uses resctrl_arch_rmid_read() to read counters. CDP emulation
>>> means
>>> the counter may need reading in three different ways. The same goes for
>>> reset.
>>>
>>> The helpers behind the resctrl_arch_ functions will be re-used for the
>>> ABMC
>>> equivalent functions.
>>>
>>> Add the rounding helper for checking monitor values while we're here.
>>>
>>> Tested-by: Gavin Shan <gshan at redhat.com>
>>> Tested-by: Shaopeng Tan <tan.shaopeng at jp.fujitsu.com>
>>> Tested-by: Peter Newman <peternewman at google.com>
>>> Tested-by: Zeng Heng <zengheng4 at huawei.com>
>>> Reviewed-by: Shaopeng Tan <tan.shaopeng at jp.fujitsu.com>
>>> Reviewed-by: Jonathan Cameron <jonathan.cameron at huawei.com>
>>> Signed-off-by: James Morse <james.morse at arm.com>
>>> Signed-off-by: Ben Horgan <ben.horgan at arm.com>
>>> ---
>>
>> [...]
>>
>>> +
>>> +static int read_mon_cdp_safe(struct mpam_resctrl_mon *mon, struct
>>> mpam_component *mon_comp,
>>> +                 enum mpam_device_features mon_type,
>>> +                 int mon_idx, u32 closid, u32 rmid, u64 *val)
>>> +{
>>> +    if (cdp_enabled) {
>>
>> While reviewing the resctrl limbo handling code, I noticed a issue in
>> __check_limbo() that could lead to premature RMID release when CDP is
>> enabled.
>>
>> In __check_limbo(), RMIDs in limbo state undergo L3 occupancy checks
>> before being released. This check is performed via
>> resctrl_arch_rmid_read(), on arm64 MPAM, which relies on the cdp_enabled
>> state to determine to check which PARTID.
>>
>> The concern arises in the following scenario: Filesystem is mounted with
>> CDP enabled. During normal operation, some RMIDs enter limbo. On umount,
>> cdp_enabled is reset to false. __check_limbo() may then run and perform
>> L3 checks with cdp_enabled = false. This could cause RMIDs to be
>> incorrectly released from limbo while still effectively busy after
>> remount.
> 
> I think a stale limbo list cause more problems than that. If you mount
> with cdp disabled, cause some rmids to be dirty, unmount and then
> remount with cdp enabled then you may have some of the entries in upper
> half marked as busy but when the limbo code checks them it ends up using
> an out of range partid and may trigger an mpam error interrupt.
> 
> To avoid a stale list we could disable the limbo checking at unmount and
> at remount remake the bitmap. This would involve some resctrl changes
> which I will have a further look into. For now, to avoid the dependency
> without a lot of patch churn in this series I think we can hide the cdp
> enablement behind CONFIG_EXPERT. Does that sound ok to you?
> 
> Thanks,
> 
> Ben
> 

Confirmed. Toggling between non-CDP and CDP mount modes leads to
out-of-range PARTID hardware errors and memory access violations. This
can cause MPAM to halt by provoking mpam_broken_work.

I agreed properly fixing this will require resctrl modifications to
handle the limbo state across mount cycles. Hiding CDP behind
CONFIG_EXPERT is acceptable as a short-term mitigation to prevent users
from hitting this bug accidentally.


Best regards,
Zeng Heng



More information about the linux-arm-kernel mailing list