[PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
Drew Fustini
fustini at kernel.org
Sat May 16 18:38:59 PDT 2026
On Tue, May 12, 2026 at 09:26:53PM +0000, sashiko-bot at kernel.org wrote:
> > @@ -308,6 +348,76 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
> > mutex_unlock(&ctrl->lock);
> > return err;
> >
> > + case QOS_L3_MBM_TOTAL_EVENT_ID: {
> > + struct cbqri_controller *bc;
> > +
> > + /*
> > + * The L3 monitoring domain's id is the L3 cache id. The
> > + * matching ctrl domain's hw_dom->paired_bc was cached at
> > + * add time to avoid walking cbqri_controllers on every read.
> > + */
> > + d = cbqri_find_ctrl_domain(&r->ctrl_domains, hdr->id);
>
> Is there a list traversal race here that could happen during CPU hotplug?
>
> When offlining a CPU, the domain removal process uses list_del_rcu() and frees
> domains after calling synchronize_rcu(). However, cbqri_find_ctrl_domain()
> traverses the ctrl_domains list without holding an rcu_read_lock().
>
> If synchronize_rcu() does not wait for this reader, could this lead to a
> use-after-free when the freed domain struct is accessed?
I'll change it to take cbqri_domain_list_lock in
resctrl_arch_rmid_read() and resctrl_arch_reset_rmid(), and drop the RCU
on the writer side in favor of plain list_del under the same mutex.
> > @@ -730,6 +840,16 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
> > res->mon.num_rmid = ctrl->mcid_count;
> > resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID,
> > false, 0, NULL);
> > +
> > + /*
> > + * Expose BC bandwidth monitoring as the L3's
> > + * mbm_total_bytes when a BC shares topology with this
> > + * L3 (MPAM "MB on L3" mapping).
> > + */
> > + if (cbqri_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID])
> > + resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID,
> > + false, 0, NULL);
> > +
> > res->mon_capable = true;
> > }
> > break;
>
> Is bandwidth monitoring inadvertently disabled if the L3 capacity controller
> lacks occupancy monitoring?
>
> Because this block is inside an if condition checking ctrl->mon_capable where
> ctrl is the L3 capacity controller, it appears the bandwidth monitoring won't
> be exposed even if a valid and monitoring-capable bandwidth controller is
> present on the system.
The L3 mon_domain is currently anchored on the L3 CC's mon_capable path,
so a system with a mon-capable BC but a mon-incapable L3 CC loses
mbm_total_bytes. That can be addressed in a future series.
> > @@ -949,6 +1072,29 @@ static int cbqri_attach_cpu_to_l3_mon(struct cbqri_controller *ctrl,
> > if (err)
> > goto err_offline;
> >
> > + /*
> > + * Pair this L3 domain with the system's mon-capable BC. The
> > + * cached pointer is consulted by every rmid_read / reset_rmid.
> > + * BC mon init is system-wide. Failure here means mbm_total_bytes
> > + * is unusable for any domain, so fail the attach.
> > + */
> > + {
> > + struct cbqri_resctrl_dom *hw_dom = container_of(ctrl_dom,
> > + struct cbqri_resctrl_dom,
> > + resctrl_ctrl_dom);
> > +
> > + hw_dom->paired_bc = cbqri_find_only_mon_bc();
> > + if (hw_dom->paired_bc) {
> > + err = cbqri_init_bc_mon_counters(hw_dom->paired_bc);
>
> Could this sequence result in a use-after-free or a null pointer dereference?
>
> By the time this block runs, resctrl_online_mon_domain() has already been
> called earlier in the function, exposing the monitoring domain to user space.
> If a concurrent sysfs read triggers resctrl_arch_rmid_read() at this moment,
> it could pass the lockless check for bc->mbm_total_states and block waiting
> for the hardware lock.
>
> If cbqri_init_bc_mon_counters() fails, it drops the lock, frees
> bc->mbm_total_states, and sets it to null.
>
> When the sysfs read unblocks, will it dereference the freed pointer? Might it
> be safer to defer domain onlining until after the hardware state is fully
> initialized?
I will change it so cbqri_init_bc_mon_counters() and the paired_bc
assignment happen before resctrl_online_mon_domain(), so no reader can
observe the half-initialized state.
>
> Furthermore, does this unconditional assignment of cbqri_find_only_mon_bc()
> cause bandwidth overcounting on platforms with multiple L3 domains?
>
> If the single system-wide bandwidth controller is paired with every L3
> domain, standard user-space tools that calculate group bandwidth by summing
> values across all L3 domains might overcount the true system bandwidth by
> a factor equal to the number of active L3 domains.
Only a single bandwidth controller is supported with this implemenation.
cbqri_resctrl_pick_counters() counts distinct L3 cache_ids at setup and
refuses to expose mbm_total_bytes when more than one L3 domain is
present.
-Drew
More information about the linux-riscv
mailing list