resctrl2 - status

Jonathan Cameron Jonathan.Cameron at Huawei.com
Mon Sep 18 03:44:20 PDT 2023


On Fri, 15 Sep 2023 10:55:58 -0700
Drew Fustini <dfustini at baylibre.com> wrote:

> On Fri, Sep 08, 2023 at 04:13:54PM -0700, Tony Luck wrote:
> > On Fri, Sep 08, 2023 at 04:35:05PM -0500, Moger, Babu wrote:  
> > > Hi Tony,
> > > 
> > > 
> > > On 9/8/2023 1:51 PM, Luck, Tony wrote:  
> > > > > > Can you try this out on an AMD system. I think I covered most of the
> > > > > > existing AMD resctrl features, but I have no machine to test the code
> > > > > > on, so very likely there are bugs in these code paths.
> > > > > > 
> > > > > > I'd like to make any needed changes now, before I start breaking this
> > > > > > into reviewable bite-sized patches to avoid too much churn.  
> > > > > I tried your latest code briefly on my system.  Unfortunately, I could
> > > > > not get it to work on my AMD system.
> > > > > 
> > > > > # git branch -a
> > > > >     next
> > > > > * resctrl2_v65
> > > > > # ]# uname -r
> > > > > 6.5.0+
> > > > > #lsmod |grep rdt
> > > > > rdt_show_ids           12288  0
> > > > > rdt_mbm_local_bytes    12288  0
> > > > > rdt_mbm_total_bytes    12288  0
> > > > > rdt_llc_occupancy      12288  0
> > > > > rdt_l3_cat             16384  0
> > > > > 
> > > > > # lsmod |grep mbe
> > > > > amd_mbec               16384  0
> > > > > 
> > > > > I could not get  rdt_l3_mba
> > > > > 
> > > > > # modprobe rdt_l3_mba
> > > > > modprobe: ERROR: could not insert 'rdt_l3_mba': No such device
> > > > > 
> > > > > I don't see any data for the default group either.
> > > > > 
> > > > > mount  -t resctrl resctrl /sys/fs/resctrl/
> > > > > 
> > > > > cd /sys/fs/resctrl/mon_data/mon_L3_00
> > > > > 
> > > > > cat mbm_summary
> > > > >        n/a      n/a /  
> > > > Babu,
> > > > 
> > > > Thank a bunch for taking this for a quick spin. There's several bits of
> > > > good news there. Several modules automatically loaded as expected.
> > > > Nothing went "OOPS" and crashed the system.
> > > > 
> > > > Here’s the code that the rdt_l3_mba module runs that can cause failure
> > > > to load with "No such device"
> > > > 
> > > >          if (!boot_cpu_has(X86_FEATURE_RDT_A)) {
> > > >                  pr_debug("No RDT allocation support\n");
> > > >                  return -ENODEV;
> > > >          }  
> > > 
> > > Shouldn't this be ?(or similar)
> > > 
> > > if (!rdt_cpu_has(X86_FEATURE_MBA))
> > >                 return false;  
> > 
> > Yes. I should be using X86_FEATURE bits where they are available
> > rather than peeking directly at CPUID register bits.
> >   
> > >   
> > > >          mba_features = cpuid_ebx(0x10);
> > > > 
> > > >          if (!(mba_features & BIT(3))) {
> > > >                  pr_debug("No RDT MBA allocation\n");
> > > >                  return -ENODEV;
> > > >          }
> > > > 
> > > > I assume the first test must have succeeded (same code in rdt_l3_cat, and
> > > > that loaded OK). So must be the second. How does AMD enumerate MBA
> > > > support?
> > > > 
> > > > Less obvious what is the root cause of the mbm_summary file to fail to
> > > > show any data. rdt_mbm_local_bytes  and rdt_mbm_total_bytes  modules
> > > > loaded OK. So I'm looking for the right CPUID bits to detect memory bandwidth
> > > > monitoring.  
> > > 
> > > I am still not sure if resctrl2 will address all the current gaps in
> > > resctrl1. We should probably list all issues on the table before we go that
> > > route.  
> > 
> > Indeed yes! I don't want to have to do resctrl3 in a few years to
> > cover gaps that could have been addressed in resctrl2.
> > 
> > However, fixing resctrl gaps is only one of the motivations for
> > the rewrite. The bigger one is making life easier for all the
> > architectures sharing the common code to do what they need to
> > for their own quirks & differences without cluttering the
> > common code base, or worrying "did my change just break something
> > for another CPU architecture".
> >   
> > > One of the main issue for AMD is coupling of LLC domains.
> > > 
> > > For example, AMD hardware supports 16 CLOSids per LLC domain. But Linux
> > > design assumes that there are globally 16 total CLOSIDs for the whole
> > > systems. We can only create 16 CLOSID now irrespective of how many domains
> > > are there.
> > > 
> > > In reality, we should be able to create "16 x number of LLC domains" CLOSIDS
> > > in the systems.  This is more evident in AMD. But, same problem applies to
> > > Intel with multiple sockets.  
> > 
> > I think this can be somewhat achieved already with a combination of
> > resctrl and cpusets (or some other way to set CPU affinity for tasks
> > to only run on CPUs within a specific domain (or set of domains).
> > That's why the schemata file allows setting different CBM masks
> > per domain.
> > 
> > Can you explain how you would use 64 domains on a system with 4 domains
> > and 16 CLOSID per domain?
> >   
> > > My 02 cents. Hope to discuss more in our upcoming meeting.  
> > Agreed. This will be faster when we can talk instead of type :-)  
> 
> Is it a meeting that other interested developers can join?
> 
> This reminds me that Linux Plumbers Conference [1] is in November and
> I think resctrl2 could be a good topic. The CFP is still open for Birds
> of a Feather (BoF) proposals [2]. These are free-form get-togethers for
> people wishing to discuss a particular topic, and I have had success
> hosting them in the past for topics like pinctrl and gpio.
> 
> Anyone planning to attend Plumbers?
> 
> I'll be going in person but the virtual option works really well in my
> experience. I had developers and maintainers attending virtually
> participate in my BoF sessions and I felt it was very productive.

FWIW I'm keen and should be there in person.  However, I'm not on the must
be available list for this one ;)   Agree that hybrid worked fine for BoF last
year.

Jonathan


> 
> thanks,
> drew
> 
> [1] https://lpc.events/
> [2] https://lpc.events/event/17/abstracts/
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel




More information about the linux-arm-kernel mailing list