Testing DDR MPAM monitor (mbm_total_bytes)

Peter Newman peternewman at google.com
Tue Sep 12 01:56:18 PDT 2023


Hi Amit,

On Sun, Sep 10, 2023 at 8:58 PM Amit Singh Tomar <amitsinght at marvell.com> wrote:
>
> Hi James,
>
> We have two types of MSC's:
>
> 1 ) L3 MSC with controls/features:
>     "mpam_feat_cpor_part", and "mpam_feat_msmon_csu",
>
> 2) DDR MSC with controls/features:
>     "mpam_feat_mbw_min", "mpam_feat_mbw_max", "mpam_feat_msmon_mbwu", "mpam_feat_msmon_mbwu_63counter", and "mpam_feat_msmon_mbwu_rwbw".
>
> Trying to test DDR MPAM monitors (under DDR MSC mapped to mpam_feat_msmon_mbwu feature) using your latest snapshot[1] (mpam/snapshot/v6.5-rc1) but found few issues.
>
> 1) When try to mount the resource control, seeing the following stack trace:
>
> root at localhost:~# mount -t resctrl resctrl -o this_is_not_abi /sys/fs/resctrl
> mount: /sys/fs/resctrl: permission denied.
> root at localhost:~# dmesg | tail -33
> [   36.719569] ------------[ cut here ]------------
> [   36.719571] WARNING: CPU: 23 PID: 786 at fs/resctrl/rdtgroup.c:3109 mkdir_mondata_subdir+0x214/0x228
> [   36.719579] Modules linked in:
> [   36.719580] CPU: 23 PID: 786 Comm: mount Not tainted 6.5.0-rc1-g9f0a8101361c-dirty #2
> [   36.719582] Hardware name: Marvell SP1W5NXX board (DT)
> [   36.719582] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
> [   36.719584] pc : mkdir_mondata_subdir+0x214/0x228
> [   36.719586] lr : mkdir_mondata_subdir+0x90/0x228
> [   36.719588] sp : ffff800082d73b00
> [   36.719589] x29: ffff800082d73b10 x28: ffff00011dc1bc00 x27: 0000000000000000
> [   36.719591] x26: ffffa9a19cbe5388 x25: ffffa9a19cba5880 x24: ffff000103364008
> [   36.719593] x23: ffffa9a19cbe5320 x22: ffff00010675c280 x21: ffff00010675c480
> [   36.719594] x20: 000000000675c480 x19: ffff00010675c280 x18: 0000000000000030
> [   36.719596] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff
> [   36.719598] x14: ffff800102d73bb7 x13: 0000000000000002 x12: ffff800082d73bc0
> [   36.719599] x11: 0000000000000001 x10: 000000000000005f x9 : 0000000000000001
> [   36.719601] x8 : 0101010101010101 x7 : ffff7f7f7fff7f7f x6 : fefdff0005c7ff2f
> [   36.719603] x5 : 0000800000008000 x4 : 0000000000000006 x3 : 0000000000000000
> [   36.719604] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffa9a19cbe5388
> [   36.719606] Call trace:
> [   36.719607]  mkdir_mondata_subdir+0x214/0x228
> [   36.719609]  mkdir_mondata_all+0xb0/0x110
> [   36.719612]  rdt_get_tree+0x218/0x500
> [   36.719614]  vfs_get_tree+0x28/0xec
> [   36.719616]  path_mount+0x3d4/0xa4c
> [   36.719618]  __arm64_sys_mount+0x1d4/0x2b0
> [   36.719619]  invoke_syscall+0x48/0x114
> [   36.719622]  el0_svc_common.constprop.0+0x44/0xe4
> [   36.719625]  do_el0_svc+0x38/0xa4
> [   36.719627]  el0_svc+0x2c/0x84
> [   36.719630]  el0t_64_sync_handler+0xc0/0xc4
> [   36.719632]  el0t_64_sync+0x190/0x194
> [   36.719633] ---[ end trace 0000000000000000 ]---
>
> It is due to the fact the event list for MBA resource is empty [1] (event list is created only for L3 resource), and we hit here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/tree/fs/resctrl/rdtgroup.c?h=mpam/snapshot/v6.5-rc1#n3109
>
> With the following change, managed to mount the resource control:
>
> # git diff fs/resctrl/rdtgroup.c
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 43efed317f1b..655757183b84 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -3096,6 +3096,9 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>         char name[32];
>         int ret;
>
> +       if (!strcmp(r->name, "MB"))
> +               return 0;
> +

Can you explain why it's a problem for the memory bandwidth allocation
(MBA) resource to not have any monitoring events? All of the
monitoring events are on the L3 resource on the existing x86
implementations and this doesn't lead to any issues.

You explained earlier that two different MSCs have monitoring
capabilities on this hardware and this crash report is suggesting that
some monitoring events should have been assigned to the MBA resource.

The common resctrl code, especially the implementation of the RMID
limbo list assumes a single monitoring resource in the system. That
is, a single resource is expected to be able to tell you the memory
bandwidth AND LLC occupancy of an RMID. That shouldn't be hard to fix,
but if the LLC occupancy and MBM domains didn't align, fixing the
limbo mechanism would become much more difficult.

Can you start by explaining why monitoring resources should be
assigned to different resources? What would have gone wrong if the DDR
MSCs' bandwidth event counts were attached to the L3 resource? At a
high level, the concept of a "resource" in resctrl seems abstract
enough that it's difficult for me to understand why the MPAM code
would choose to arrange things this way.

Thanks!
-Peter



More information about the linux-arm-kernel mailing list