[PATCH v5 31/41] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid()
Zeng Heng
zengheng4 at huawei.com
Sat Mar 7 01:29:13 PST 2026
Hi Ben,
On 2026/2/25 1:57, Ben Horgan wrote:
> From: James Morse <james.morse at arm.com>
>
> resctrl uses resctrl_arch_rmid_read() to read counters. CDP emulation means
> the counter may need reading in three different ways. The same goes for
> reset.
>
> The helpers behind the resctrl_arch_ functions will be re-used for the ABMC
> equivalent functions.
>
> Add the rounding helper for checking monitor values while we're here.
>
> Tested-by: Gavin Shan <gshan at redhat.com>
> Tested-by: Shaopeng Tan <tan.shaopeng at jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman at google.com>
> Tested-by: Zeng Heng <zengheng4 at huawei.com>
> Reviewed-by: Shaopeng Tan <tan.shaopeng at jp.fujitsu.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron at huawei.com>
> Signed-off-by: James Morse <james.morse at arm.com>
> Signed-off-by: Ben Horgan <ben.horgan at arm.com>
> ---
[...]
> +
> +static int read_mon_cdp_safe(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
> + enum mpam_device_features mon_type,
> + int mon_idx, u32 closid, u32 rmid, u64 *val)
> +{
> + if (cdp_enabled) {
While reviewing the resctrl limbo handling code, I noticed a issue in
__check_limbo() that could lead to premature RMID release when CDP is
enabled.
In __check_limbo(), RMIDs in limbo state undergo L3 occupancy checks
before being released. This check is performed via
resctrl_arch_rmid_read(), on arm64 MPAM, which relies on the cdp_enabled
state to determine to check which PARTID.
The concern arises in the following scenario: Filesystem is mounted with
CDP enabled. During normal operation, some RMIDs enter limbo. On umount,
cdp_enabled is reset to false. __check_limbo() may then run and perform
L3 checks with cdp_enabled = false. This could cause RMIDs to be
incorrectly released from limbo while still effectively busy after
remount.
Apologies for not providing a ready-made fix in this email. However,
I would appreciate to hear the community's thoughts on this issue.
> + u64 code_val = 0, data_val = 0;
> + int err;
> +
> + err = __read_mon(mon, mon_comp, mon_type, mon_idx,
> + CDP_CODE, closid, rmid, &code_val);
> + if (err)
> + return err;
> +
> + err = __read_mon(mon, mon_comp, mon_type, mon_idx,
> + CDP_DATA, closid, rmid, &data_val);
> + if (err)
> + return err;
> +
> + *val += code_val + data_val;
> + return 0;
> + }
> +
> + return __read_mon(mon, mon_comp, mon_type, mon_idx,
> + CDP_NONE, closid, rmid, val);
> +}
> +
> +/* MBWU when not in ABMC mode, and CSU counters. */
> +int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
> + u32 closid, u32 rmid, enum resctrl_event_id eventid,
> + void *arch_priv, u64 *val, void *arch_mon_ctx)
> +{
> + struct mpam_resctrl_dom *l3_dom;
> + struct mpam_component *mon_comp;
> + u32 mon_idx = *(u32 *)arch_mon_ctx;
> + enum mpam_device_features mon_type;
> + struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[eventid];
> +
> + resctrl_arch_rmid_read_context_check();
> +
> + if (eventid >= QOS_NUM_EVENTS || !mon->class)
> + return -EINVAL;
> +
> + l3_dom = container_of(hdr, struct mpam_resctrl_dom, resctrl_mon_dom.hdr);
> + mon_comp = l3_dom->mon_comp[eventid];
> +
> + switch (eventid) {
> + case QOS_L3_OCCUP_EVENT_ID:
> + mon_type = mpam_feat_msmon_csu;
> + break;
> + case QOS_L3_MBM_LOCAL_EVENT_ID:
> + case QOS_L3_MBM_TOTAL_EVENT_ID:
> + mon_type = mpam_feat_msmon_mbwu;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + return read_mon_cdp_safe(mon, mon_comp, mon_type, mon_idx,
> + closid, rmid, val);
> +}
> +
Best regards,
Zeng Heng
More information about the linux-arm-kernel
mailing list