[PATCH v5 38/41] arm_mpam: Add workaround for T241-MPAM-4
Fenghua Yu
fenghuay at nvidia.com
Mon Mar 9 10:39:54 PDT 2026
Hi, Ben,
On 3/2/26 09:11, Ben Horgan wrote:
> Hi Fenghua,
>
> On 3/1/26 17:28, Fenghua Yu wrote:
>> Hi, Ben,
>>
>> On 2/24/26 09:57, Ben Horgan wrote:
>>> From: Shanker Donthineni <sdonthineni at nvidia.com>
>>>
>>> In the T241 implementation of memory-bandwidth partitioning, in the
>>> absence
>>> of contention for bandwidth, the minimum bandwidth setting can affect the
>>> amount of achieved bandwidth. Specifically, the achieved bandwidth in the
>>> absence of contention can settle to any value between the values of
>>> MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set
>>> zero (below 0.78125%), once a core enters a throttled state, it will
>>> never
>>> leave that state.
>>>
>>> The first issue is not a concern if the MPAM software allows to program
>>> MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
>>> MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
>>>
>>> In the scenario where the resctrl doesn't support the MBW_MIN
>>> interface via
>>> sysfs, to achieve bandwidth closer to MBW_MAX in the absence of
>>> contention,
>>> software should configure a relatively narrow gap between MBW_MIN and
>>> MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem.
>>>
>>> Clear the feature MBW_MIN feature from the class to ensure we don't
>>> accidentally change behaviour when resctrl adds support for a MBW_MIN
>>> interface.
>>>
>>> Tested-by: Gavin Shan <gshan at redhat.com>
>>> Tested-by: Shaopeng Tan <tan.shaopeng at jp.fujitsu.com>
>>> Reviewed-by: Shaopeng Tan <tan.shaopeng at jp.fujitsu.com>
>>> Signed-off-by: Shanker Donthineni <sdonthineni at nvidia.com>
>>> Signed-off-by: James Morse <james.morse at arm.com>
>>> Signed-off-by: Ben Horgan <ben.horgan at arm.com>
>>
>> Reviewed-by: Fenghua Yu <fenghuay at nvidia.com>
>>
>> This patch itself is good.
>>
>> Please check the following comments.
>>
>>> ---
>>> [ morse: Added as second quirk, adapted to use the new intermediate
>>> values
>>> in mpam_extend_config() ]
>>>
>>> Changes since rfc:
>>> MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
>>> Handling when reset_mbw_min is set
>>>
>>> Changes since v3:
>>> Move the 5% gap policy back here
>>> Clear mbw_min feature in class
>>> ---
>>> Documentation/arch/arm64/silicon-errata.rst | 2 +
>>> drivers/resctrl/mpam_devices.c | 50 +++++++++++++++++++--
>>> drivers/resctrl/mpam_internal.h | 1 +
>>> 3 files changed, 50 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/
>>> Documentation/arch/arm64/silicon-errata.rst
>>> index a65620f98e3a..a4b246655e37 100644
>>> --- a/Documentation/arch/arm64/silicon-errata.rst
>>> +++ b/Documentation/arch/arm64/silicon-errata.rst
>>> @@ -249,6 +249,8 @@ stable kernels.
>>> +----------------+-----------------+-----------------
>>> +-----------------------------+
>>> | NVIDIA | T241 MPAM | T241-MPAM-1 | N/
>>> A |
>>> +----------------+-----------------+-----------------
>>> +-----------------------------+
>>> +| NVIDIA | T241 MPAM | T241-MPAM-4 | N/
>>> A |
>>> ++----------------+-----------------+-----------------
>>> +-----------------------------+
>>> +----------------+-----------------+-----------------
>>> +-----------------------------+
>>> | Freescale/NXP | LS2080A/LS1043A | A-008585 |
>>> FSL_ERRATUM_A008585 |
>>> +----------------+-----------------+-----------------
>>> +-----------------------------+
>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>>> mpam_devices.c
>>> index 08cb080592d9..8f44e9dee207 100644
>>> --- a/drivers/resctrl/mpam_devices.c
>>> +++ b/drivers/resctrl/mpam_devices.c
>>> @@ -679,6 +679,12 @@ static const struct mpam_quirk mpam_quirks[] = {
>>> .iidr_mask = MPAM_IIDR_MATCH_ONE,
>>> .workaround = T241_SCRUB_SHADOW_REGS,
>>> },
>>> + {
>>> + /* NVIDIA t241 erratum T241-MPAM-4 */
>>> + .iidr = MPAM_IIDR_NVIDIA_T241,
>>> + .iidr_mask = MPAM_IIDR_MATCH_ONE,
>>> + .workaround = T241_FORCE_MBW_MIN_TO_ONE,
>>> + },
>>> { NULL } /* Sentinel */
>>> };
>>> @@ -1464,6 +1470,31 @@ static void
>>> mpam_quirk_post_config_change(struct mpam_msc_ris *ris, u16 partid,
>>> mpam_apply_t241_erratum(ris, partid);
>>> }
>>> +static u16 mpam_wa_t241_force_mbw_min_to_one(struct mpam_props *props)
>>> +{
>>> + u16 max_hw_value, min_hw_granule, res0_bits;
>>> +
>>> + res0_bits = 16 - props->bwa_wd;
>>> + max_hw_value = ((1 << props->bwa_wd) - 1) << res0_bits;
>>> + min_hw_granule = ~max_hw_value;
>>> +
>>> + return min_hw_granule + 1;
>>> +}
>>> +
>>> +static u16 mpam_wa_t241_calc_min_from_max(struct mpam_config *cfg)
>>> +{
>>> + u16 val = 0;
>>> +
>>> + if (mpam_has_feature(mpam_feat_mbw_max, cfg)) {
>>
>> But the problem is mpam_feat_mbw_max feature is NOT set in cfg.
>>
>>> + u16 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>>> +
>>> + if (cfg->mbw_max > delta)
>>> + val = cfg->mbw_max - delta;
>>> + }
>>> +
>>> + return val;
>>
>> So 0 is always returned.
>>
>> The workaround will set mbw_min as 1% which is too small and will cause
>> performance degradation, e.g. about 20% degradation on some benchmarks.
>>
>> This patch itself doesn't have any issue.
>>
>> The issue is the mbw_max feature bit in cfg is not set.
>
> This is intended behaviour as the reset is done independently
> from the value set in the config. The value is there so that
> resctrl can display the expected values.
>
>> This is a legacy issue, not introduced by this patch set.
>>> Here is a fix patch for the issue:
>> https://lore.kernel.org/lkml/20260301171829.1357886-1-
>> fenghuay at nvidia.com/T/#u
>
> I've commented on that patch. I think it's best to fix it in the context
> of the erratum.
>
> Does the below solve your performance problems?
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 236f78ab9163..60d3d3e2193f 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1515,16 +1515,20 @@ static u16 mpam_wa_t241_force_mbw_min_to_one(struct mpam_props *props)
> return min_hw_granule + 1;
> }
>
> -static u16 mpam_wa_t241_calc_min_from_max(struct mpam_config *cfg)
> +static u16 mpam_wa_t241_calc_min_from_max(struct mpam_props *props,
> + struct mpam_config *cfg)
> {
> u16 val = 0;
> + u16 max;
> + u16 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>
> - if (mpam_has_feature(mpam_feat_mbw_max, cfg)) {
> - u16 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
> + if (mpam_has_feature(mpam_feat_mbw_max, cfg))
> + max = cfg->mbw_max;
> + else
> + max = GENMASK(15, 16 - cprops->bwa_wd);
>
> - if (cfg->mbw_max > delta)
> - val = cfg->mbw_max - delta;
> - }
Could you please add some comments on this piece of code? It's worth to
comment on why there are different values on cfg and props.
> + if (max > delta)
> + val = max - delta;
>
> return val;
> }
> @@ -1577,9 +1581,8 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, msc)) {
> u16 min = mpam_wa_t241_force_mbw_min_to_one(rprops);
>
> - val = mpam_wa_t241_calc_min_from_max(cfg);
> - if (val < min)
> - val = min;
> + val = mpam_wa_t241_calc_min_from_max(rprops, cfg);
> + val = max(val, min);
> }
>
> mpam_write_partsel_reg(msc, MBW_MIN, val);
>
Otherwise, this change looks good to me.
Thanks.
-Fenghua
More information about the linux-arm-kernel
mailing list