resctrl2 - status
Moger, Babu
bmoger at amd.com
Fri Sep 8 14:35:05 PDT 2023
Hi Tony,
On 9/8/2023 1:51 PM, Luck, Tony wrote:
>>> Can you try this out on an AMD system. I think I covered most of the
>>> existing AMD resctrl features, but I have no machine to test the code
>>> on, so very likely there are bugs in these code paths.
>>>
>>> I'd like to make any needed changes now, before I start breaking this
>>> into reviewable bite-sized patches to avoid too much churn.
>> I tried your latest code briefly on my system. Unfortunately, I could
>> not get it to work on my AMD system.
>>
>> # git branch -a
>> next
>> * resctrl2_v65
>> # ]# uname -r
>> 6.5.0+
>> #lsmod |grep rdt
>> rdt_show_ids 12288 0
>> rdt_mbm_local_bytes 12288 0
>> rdt_mbm_total_bytes 12288 0
>> rdt_llc_occupancy 12288 0
>> rdt_l3_cat 16384 0
>>
>> # lsmod |grep mbe
>> amd_mbec 16384 0
>>
>> I could not get rdt_l3_mba
>>
>> # modprobe rdt_l3_mba
>> modprobe: ERROR: could not insert 'rdt_l3_mba': No such device
>>
>> I don't see any data for the default group either.
>>
>> mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> cd /sys/fs/resctrl/mon_data/mon_L3_00
>>
>> cat mbm_summary
>> n/a n/a /
> Babu,
>
> Thank a bunch for taking this for a quick spin. There's several bits of
> good news there. Several modules automatically loaded as expected.
> Nothing went "OOPS" and crashed the system.
>
> Here’s the code that the rdt_l3_mba module runs that can cause failure
> to load with "No such device"
>
> if (!boot_cpu_has(X86_FEATURE_RDT_A)) {
> pr_debug("No RDT allocation support\n");
> return -ENODEV;
> }
Shouldn't this be ?(or similar)
if (!rdt_cpu_has(X86_FEATURE_MBA))
return false;
> mba_features = cpuid_ebx(0x10);
>
> if (!(mba_features & BIT(3))) {
> pr_debug("No RDT MBA allocation\n");
> return -ENODEV;
> }
>
> I assume the first test must have succeeded (same code in rdt_l3_cat, and
> that loaded OK). So must be the second. How does AMD enumerate MBA
> support?
>
> Less obvious what is the root cause of the mbm_summary file to fail to
> show any data. rdt_mbm_local_bytes and rdt_mbm_total_bytes modules
> loaded OK. So I'm looking for the right CPUID bits to detect memory bandwidth
> monitoring.
I am still not sure if resctrl2 will address all the current gaps in
resctrl1. We should probably list all issues on the table before we go
that route.
One of the main issue for AMD is coupling of LLC domains.
For example, AMD hardware supports 16 CLOSids per LLC domain. But Linux
design assumes that there are globally 16 total CLOSIDs for the whole
systems. We can only create 16 CLOSID now irrespective of how many
domains are there.
In reality, we should be able to create "16 x number of LLC domains"
CLOSIDS in the systems. This is more evident in AMD. But, same problem
applies to Intel with multiple sockets.
My 02 cents. Hope to discuss more in our upcoming meeting.
thanks
More information about the linux-arm-kernel
mailing list