[PATCH v5 03/18] ACPI: processor: Register deferred CPUs from acpi_processor_get_info()

Mon Apr 15 05:37:07 PDT 2024

Hi Rafael,

>  From: Rafael J. Wysocki <rafael at kernel.org>
>  Sent: Monday, April 15, 2024 1:04 PM
>  
>  On Mon, Apr 15, 2024 at 1:56 PM Jonathan Cameron
>  <Jonathan.Cameron at huawei.com> wrote:
>  >
>  > On Mon, 15 Apr 2024 13:37:08 +0200
>  > "Rafael J. Wysocki" <rafael at kernel.org> wrote:
>  >
>  > > On Mon, Apr 15, 2024 at 10:46 AM Jonathan Cameron
>  > > <Jonathan.Cameron at huawei.com> wrote:
>  > > >
>  > > > On Sat, 13 Apr 2024 01:23:48 +0200 Thomas Gleixner
>  > > > <tglx at linutronix.de> wrote:
>  > > >
>  > > > > Russell!
>  > > > >
>  > > > > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:
>  > > > > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:
>  > > > > >> > As for the cpu locking, I couldn't find anything in
>  > > > > >> > arch_register_cpu() that depends on the cpu_maps_update
>  > > > > >> > stuff nor needs the cpus_write_lock being taken - so I've
>  > > > > >> > no idea why the "make_present" case takes these locks.
>  > > > > >>
>  > > > > >> Anything which updates a CPU mask, e.g. cpu_present_mask,
>  > > > > >> after early boot must hold the appropriate write locks.
>  > > > > >> Otherwise it would be possible to online a CPU which just got
>  > > > > >> marked present, but the registration has not completed yet.
>  > > > > >
>  > > > > > Yes. As far as I've been able to determine,
>  > > > > > arch_register_cpu() doesn't manipulate any of the CPU masks.
>  > > > > > All it seems to be doing is initialising the struct cpu,
>  > > > > > registering the embedded struct device, and setting up the sysfs  links to its NUMA node.
>  > > > > >
>  > > > > > There is nothing obvious in there which manipulates any CPU
>  > > > > > masks, and this is rather my fundamental point when I said "I
>  > > > > > couldn't find anything in arch_register_cpu() that depends on ...".
>  > > > > >
>  > > > > > If there is something, then comments in the code would be a
>  > > > > > useful aid because it's highly non-obvious where such a
>  > > > > > manipulation is located, and hence why the locks are necessary.
>  > > > >
>  > > > > acpi_processor_hotadd_init()
>  > > > > ...
>  > > > >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id,
>  > > > > &pr->id);
>  > > > >
>  > > > > That ends up in fiddling with cpu_present_mask.
>  > > > >
>  > > > > I grant you that arch_register_cpu() is not, but it might rely
>  > > > > on the external locking too. I could not be bothered to figure that  out.
>  > > > >
>  > > > > >> Define "real hotplug" :)
>  > > > > >>
>  > > > > >> Real physical hotplug does not really exist. That's at least
>  > > > > >> true for x86, where the physical hotplug support was chased
>  > > > > >> for a while, but never ended up in production.
>  > > > > >>
>  > > > > >> Though virtualization happily jumped on it to hot add/remove
>  > > > > >> CPUs to/from a guest.
>  > > > > >>
>  > > > > >> There are limitations to this and we learned it the hard way
>  > > > > >> on X86. At the end we came up with the following restrictions:
>  > > > > >>
>  > > > > >>     1) All possible CPUs have to be advertised at boot time via firmware
>  > > > > >>        (ACPI/DT/whatever) independent of them being present at boot time
>  > > > > >>        or not.
>  > > > > >>
>  > > > > >>        That guarantees proper sizing and ensures that associations
>  > > > > >>        between hardware entities and software representations and the
>  > > > > >>        resulting topology are stable for the lifetime of a system.
>  > > > > >>
>  > > > > >>        It is really required to know the full topology of the system at
>  > > > > >>        boot time especially with hybrid CPUs where some of the cores
>  > > > > >>        have hyperthreading and the others do not.
>  > > > > >>
>  > > > > >>
>  > > > > >>     2) Hot add can only mark an already registered (possible) CPU
>  > > > > >>        present. Adding non-registered CPUs after boot is not  possible.
>  > > > > >>
>  > > > > >>        The CPU must have been registered in #1 already to ensure  that
>  > > > > >>        the system topology does not suddenly change in an incompatible
>  > > > > >>        way at run-time.
>  > > > > >>
>  > > > > >> The same restriction would apply to real physical hotplug. I
>  > > > > >> don't think that's any different for ARM64 or any other architecture.
>  > > > > >
>  > > > > > This makes me wonder whether the Arm64 has been barking up the
>  > > > > > wrong tree then, and whether the whole "present" vs "enabled"
>  > > > > > thing comes from a misunderstanding as far as a CPU goes.
>  > > > > >
>  > > > > > However, there is a big difference between the two. On x86, a
>  > > > > > processor is just a processor. On Arm64, a "processor" is a
>  > > > > > slice of the system (includes the interrupt controller, PMUs
>  > > > > > etc) and we must enumerate those even when the processor
>  > > > > > itself is not enabled. This is the whole reason there's a
>  > > > > > difference between "present" and "enabled" and why there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
>  > > > > > The processor never actually goes away in arm64, it's just
>  > > > > > prevented from being used.
>  > > > >
>  > > > > It's the same on X86 at least in the physical world.
>  > > >
>  > > > There were public calls on this via the Linaro Open Discussions
>  > > > group, so I can talk a little about how we ended up here.  Note
>  > > > that (in my
>  > > > opinion) there is zero chance of this changing - it took us well
>  > > > over a year to get to this conclusion.  So if we ever want ARM
>  > > > vCPU HP we need to work within these constraints.
>  > > >
>  > > > The ARM architecture folk (the ones defining the ARM ARM, relevant
>  > > > ACPI specs etc, not the kernel maintainers) are determined that
>  > > > they want to retain the option to do real physical CPU hotplug in
>  > > > the future with all the necessary work around dynamic interrupt
>  > > > controller initialization, debug and many other messy corners.
>  > >
>  > > That's OK, but the difference is not in the ACPi CPU enumeration/removal code.
>  > >
>  > > > Thus anything defined had to be structured in a way that was  'different'
>  > > > from that.
>  > >
>  > > Apparently, that's where things got confused.
>  > >
>  > > > I don't mind the proposed flattening of the 2 paths if the ARM
>  > > > kernel maintainers are fine with it but it will remove the
>  > > > distinctions and we will need to be very careful with the CPU
>  > > > masks - we can't handle them the same as x86 does.
>  > >
>  > > At the ACPI code level, there is no distinction.
>  > >
>  > > A CPU that was not available before has just become available.  The
>  > > platform firmware has notified the kernel about it and now
>  > > acpi_processor_add() runs.  Why would it need to use different code
>  > > paths depending on what _STA bits were clear before?
>  >
>  > I think we will continue to disagree on this.  To my mind and from the
>  > ACPI specification, they are two different state transitions with
>  > different required actions.
>  
>  Well, please be specific: What exactly do you mean here and which parts of
>  the spec are you talking about?
>  
>  > Those state transitions are an ACPI level thing not an arch level one.
>  > However, I want a solution that moves things forwards so I'll give
>  > pushing that entirely into the arch code a try.
>  
>  Thanks!
>  
>  Though I think that there is a disconnect between us that needs to be
>  clarified first.
>  
>  > >
>  > > Yes, there is some arch stuff to be called and that arch stuff
>  > > should figure out what to do to make things actually work.
>  > >
>  > > > I'll get on with doing that, but do need input from Will / Catalin / James.
>  > > > There are some quirks that need calling out as it's not quite a
>  > > > simple as it appears from a high level.
>  > > >
>  > > > Another part of that long discussion established that there is
>  > > > userspace (Android IIRC) in which the CPU present mask must
>  > > > include all CPUs at boot. To change that would be userspace ABI
>  > > > breakage so we can't do that.  Hence the dance around adding yet
>  > > > another mask to allow the OS to understand which CPUs are 'present'
>  but not possible to online.
>  > > >
>  > > > Flattening the two paths removes any distinction between calls
>  > > > that are for real hotplug and those that are for this online capable path.
>  > >
>  > > Which calls exactly do you mean?
>  >
>  > At the moment he distinction does not exist (because x86 only supports
>  > fake physical CPU HP and arm64 only vCPU HP / online capable), but if
>  > the architecture is defined for arm64 physical hotplug in the future
>  > we would need to do interrupt controller bring up + a lot of other stuff.
>  >
>  > It may be possible to do that in the arch code - will be hard to
>  > verify that until that arch is defined  Today all I need to do is
>  > ensure that any attempt to do present bit setting for ARM64 returns an error.
>  > That looks to be straight forward.
>  
>  OK
>  
>  >
>  > >
>  > > > As a side note, the indicating bit for these flows is defined in
>  > > > ACPI for x86 from ACPI 6.3 as a flag in Processor Local APIC (the
>  > > > ARM64 definition is a cut and paste of that text).  So someone is
>  > > > interested in this distinction on x86. I can't say who but if you
>  > > > have a mantis account you can easily follow the history and it
>  > > > might be instructive to not everyone considering the current x86
>  > > > flow the right way to do it.
>  > >
>  > > So a physically absent processor is different from a physically
>  > > present processor that has not been disabled.  No doubt about this.
>  > >
>  > > That said, I'm still unsure why these two cases require two
>  > > different code paths in acpi_processor_add().
>  >
>  > It might be possible to push the checking down into
>  > arch_register_cpu() and have that for now reject any attempt to do
>  physical CPU HP on arm64.
>  > It is that gate that is vital to getting this accepted by ARM.
>  >
>  > I'm still very much stuck on the hotadd_init flag however, so any
>  > suggestions on that would be very welcome!
>  
>  I need to do some investigation which will take some time I suppose.

You might find below cover letter and links to the presentations useful:

1. https://lore.kernel.org/qemu-devel/20230926100436.28284-1-salil.mehta@huawei.com/
2. https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
3. https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
4. https://sched.co/eE4m

Best regards
Salil.