[RFC PATCH] arch_topology: Pre-allocate cacheinfo from primary CPU

Radu Rendec rrendec at redhat.com
Mon Apr 3 14:25:53 PDT 2023


Hello Pierre,

On Fri, 2023-03-31 at 16:49 +0200, Pierre Gondois wrote:
> On 3/31/23 01:32, Radu Rendec wrote:
> 
> > Yes, it's more or less the same thing. The main differences are:
> >   * There is no window when the information in struct cpu_cacheinfo (the
> >     number of cache levels and leaves) is populated but potentially
> >     inaccurate (e.g. in the case of asymmetric CPUs with no DT/ACPI).
> 
> In the current code, fetch_cache_info() allocates the memory and populates
> num_levels/num_leaves from the primary CPU. The information is populated
> later from the secondary CPUs. So there is already a time window when
> the information is inaccurate.

Fair enough. Then this is not something to worry about.

> > Based on what you proposed, I hacked away at the code and came up with
> > the patch below. It works in my particular test case because the CPUs
> > are symmetrical, but doesn't solve the reallocation part.
> > 
> > I can't think of an elegant way to solve this. One thing that comes to
> > mind is change the logic in detect_cache_attributes(). But I am
> > reluctant because this code is architecture independent, while the
> > "guess first/fix later" strategy is implementation specific (arm64 in
> > our case). The main problem is that the call to init_cache_level() is
> > skipped altogether if there is a valid pointer value in
> > ci_cpu_cacheinfo. In other words, if the memory has been already
> > allocated, it is assumed to be correct. This is why I used a separate
> > per-cpu variable in my original RFC patch.
> > 
> > Would it be horrible to add a bool field to struct cpu_cacheinfo that
> > tells us if the levels/leaves are based on early detection? The field
> > would be set by fetch_cache_info() when it calls early_cache_level() as
> > fallback. Then detect_cache_attributes() could avoid skipping the call
> > to init_cache_level() when the flag is set. The nice thing about this
> > is that it doesn't affect other architectures - if they don't implement
> > early_cache_level(), the flag is never set and then the behavior of
> > detect_cache_attributes() is basically unchanged.
> 
> FWIW I prefer the solution with the bool/flag over the per-cpu variable.
> At:
> https://lore.kernel.org/all/20230327115953.788244-4-pierre.gondois@arm.com/
> I suggested to add a 'bool use_arch_info;' and nobody seemed to complain
> about it (so far). I assume the field could be re-used for this purpose.

In that patch, 'bool use_arch_info' is a local variable (not a field in
struct cpu_cacheinfo), so I'm not exactly sure how it can be reused.
But for now I'm going to assume it should be fine to add a field :)

I will prepare a v2 patch based on everything we have discussed so far
and send it, then I guess we can take it from there.

> > @@ -446,8 +451,11 @@ int fetch_cache_info(unsigned int cpu)
> >   */
> >   this_cpu_ci->num_leaves = levels + split_levels;
> >   }
> > - if (!cache_leaves(cpu))
> > - return -ENOENT;
> > + if (!cache_leaves(cpu)) {
> > + ret = early_cache_level(cpu);
> > + if (ret < 0)
> > + return ret;
> > + }
> 
> If there are no PPTT, acpi_get_cache_info() should return an error code.

What if the PPTT tables are there and acpi_get_cache_info() returns 0
but the number of leaves is still 0 in the end? This *does* happen in
my test environment (qemu -M virt -cpu cortex-a57). I agree that then
the system is likely broken (and in my case it's probably a qemu bug),
but my point is that the kernel should still try its best to do the
right thing and fall back to auto-detection.

> The patch-set at:
> https://lore.kernel.org/all/20230327115953.788244-1-pierre.gondois@arm.com/
> should make init_of_cache_level() return error code aswell.
> 
> I think these error codes should be caught instead of checking:
> + if (!cache_leaves(cpu))


You have a valid point here. We *must* fall back to auto-detection if
there is an error reading the DT/ACPI. Currently the function returns
early in that case.

Maybe we should check both the error code and the number of leaves? But
in that case, the error code becomes irrelevant, since the number of
leaves is likely 0 anyway if there is an error.

Best regards,
Radu




More information about the linux-arm-kernel mailing list