[PATCH v8 next 01/10] fs/resctrl: Fix MPAM Partid parsing errors by preserving CDP state during umount

Zeng Heng zengheng4 at huawei.com
Wed May 20 05:16:19 PDT 2026


Hi James,

On 2026/5/15 1:06, James Morse wrote:
> Hi Zeng,
> 
> I think this should be a separate patch as its fixing a problem not adding a feature. It's
> not actually relevant to the rest of the series.
> 

The intention behind this fix is that reqPARTID would end up the same as
the original PARTID, because the conversion between RMID and reqPARTID
relies on the `cdp_enabled` variable. Hence, I attempted to also
resolve this existing problem with the patch.

> On 13/04/2026 09:53, Zeng Heng wrote:
>> This patch fixes a pre-existing issue in the resctrl filesystem teardown
>> sequence where premature clearing of cdp_enabled could lead to MPAM Partid
>> parsing errors.
> 
> resctrl changes need to go via tip, which has a bunch of rules about commit messages,
> see Documentation/process/maintainer-tip.rst
> 
> You end up with a structure describing the current state, e.g:
> | When resctrl is umounted it disables CDP,
> 
> what the problem is, e.g:
> | CLOSID remain in the limbo list, and the mbm monitors continue to be read
> | after umount. MPAM changes the meaning of CLOSID when CDP is enabled/disabled,
> | resulting in out of bounds accesses.
> 
> Then, what you do about it, here you are:
> | Throwing away the limbo list on umount.
> 
> (I don't suggest you take this wording - its just an example)
> 
> "this patch" is a phrase to avoid, acronyms like CLOSID need capitalising, etc.
> 

Thanks for the details, I'll rework the commit to follow these
guidelines.

> 
>> The closid to partid conversion logic inherently depends on the global
>> cdp_enabled state. However, rdt_disable_ctx() clears this flag early in
>> the umount path, while free_rmid() operations will reference after that.
>> This creates a window where partid parsing operates with inconsistent CDP
>> state, potentially makes monitor reads with wrong partid mapping.
>>
>> Additionally, rmid_entry remaining in limbo between mount sessions may
>> trigger potential partid out-of-range errors, leading to MPAM fault
>> interrupts and subsequent MPAM disablement.
> 
> Can you give more details on this. I assume its going from CDP-disable to
> enabled, means MPAM doubles the CLOSID from the stale limbo list, making it
> out of range.
> 

Get it, I would explain that.

> 
>> Reorder rdt_kill_sb() to delay rdt_disable_ctx() until after
>> rmdir_all_sub() and resctrl_fs_teardown() complete. This ensures
>> all rmid-related operations finish with correct CDP state.
> 
> 
>> Introduce rdt_flush_limbo() to flush and cancel limbo work before the
>> filesystem teardown completes.
> 
> So, discard the state in the hope we don't need it again.
> What happens if the filesystem is mounted again quickly afterwards?
> Surely we get noisy bandwidth results for ~minutes afterwards?
> 
> 
>> An alternative approach would be to cancel limbo work on umount
> 
> Sounds like a move in the right direction - having bits of resctrl still
> taking CPU time when its not in use is surprising.
> 
> I'd love to eventually remove the limbo worker and have the RMID alloc code
> search the limbo list for a clean RMID when a control/monitor group is created.
> By deferring the work as late as possible, we do less work overall.
> 
> 
>> and restart it on remount with remaked bitmap.
>> However, this would require substantial changes in the resctrl layer to
>> handle CDP state transitions across mount sessions,
> 
> This would be necessary if the limbo timer was stopped on umount too.
> It also covers cases where you kexec and re-mount resctrl.
> 
> I think this is a good idea. I agree its more work.
> 
> 
>> which is beyond the
>> scope of the reqpartid feature work this patchset focuses on.
> 
> Was it a mistake to include it in this series then?
> 
> 
>> The current
>> fix addresses the immediate correctness issue with minimal churn.
> 
> I'm not a fan of papering over problems in resctrl. Could we do it properly
> by rebuilding the limbo list at mount time as you suggested above?
> 
> 

I discussed this with Ben earlier, and the remake bitmap approach was
actually his proposal:
https://lore.kernel.org/all/b95077d7-c036-4a8f-8e42-8f1dc0288075@arm.com/


Best regards,
Zeng Heng



More information about the linux-arm-kernel mailing list