[PATCH v5 03/18] ACPI: processor: Register deferred CPUs from acpi_processor_get_info()

Wed Apr 17 09:19:37 PDT 2024

On Wed, Apr 17, 2024 at 5:01 PM Salil Mehta <salil.mehta at huawei.com> wrote:
>
> HI Rafael,
>
> >  From: Rafael J. Wysocki <rafael at kernel.org>
> >  Sent: Monday, April 15, 2024 5:39 PM
> >
> >  On Mon, Apr 15, 2024 at 5:31 PM Salil Mehta <salil.mehta at huawei.com>  wrote:
> >  >
> >  > >  From: Rafael J. Wysocki <rafael at kernel.org>
> >  > >  Sent: Monday, April 15, 2024 1:51 PM
> >  > >
> >  > >  On Mon, Apr 15, 2024 at 1:51 PM Salil Mehta
> >  > > <salil.mehta at huawei.com>
> >  > >  wrote:
> >  > >  >
> >
> >  [cut]
> >
> >  > >  > >  Though virtualization happily jumped on it to hot add/remove
> >  > > CPUs  > > to/from  a guest.
> >  > >  > >
> >  > >  > >  There are limitations to this and we learned it the hard way
> >  > > on  > > X86. At the  end we came up with the following restrictions:
> >  > >  > >
> >  > >  > >      1) All possible CPUs have to be advertised at boot time via  firmware
> >  > >  > >         (ACPI/DT/whatever) independent of them being present at boot time
> >  > >  > >         or not.
> >  > >  > >
> >  > >  > >         That guarantees proper sizing and ensures that associations
> >  > >  > >         between hardware entities and software representations and the
> >  > >  > >         resulting topology are stable for the lifetime of a system.
> >  > >  > >
> >  > >  > >         It is really required to know the full topology of the system at
> >  > >  > >         boot time especially with hybrid CPUs where some of the cores
> >  > >  > >         have hyperthreading and the others do not.
> >  > >  > >
> >  > >  > >
> >  > >  > >      2) Hot add can only mark an already registered (possible) CPU
> >  > >  > >         present. Adding non-registered CPUs after boot is not possible.
> >  > >  > >
> >  > >  > >         The CPU must have been registered in #1 already to ensure that
> >  > >  > >         the system topology does not suddenly change in an incompatible
> >  > >  > >         way at run-time.
> >  > >  > >
> >  > >  > >  The same restriction would apply to real physical hotplug. I
> >  > > don't  > > think that's  any different for ARM64 or any other architecture.
> >  > >  >
> >  > >  >
> >  > >  > There is a difference:
> >  > >  >
> >  > >  > 1.   ARM arch does not allows for any processor to be NOT present. Hence,  because of
> >  > >  > this restriction any of its related per-cpu components must be
> >  > > present  > and enumerated at the boot time as well (exposed by
> >  > > firmware and  > ACPI). This means all the enumerated processors will
> >  > > be marked as  > 'present' but they might exist in NOT enabled (_STA.enabled=0) state.
> >  > >  >
> >  > >  > There was one clear difference and please correct me if I'm wrong
> >  > > > here,  for x86, the LAPIC associated with the x86 core can be brought online later even after boot?
> >  > >  >
> >  > >  > But for ARM Arch, processors and its corresponding per-cpu
> >  > > components  > like redistributors all need to be present and
> >  > > enumerated during the  > boot time. Redistributors are part of ALWAYS-ON power domain.
> >  > >
> >  > >  OK
> >  > >
> >  > >  So what exactly is the problem with this and what does
> >  > >  acpi_processor_add() have to do with it?
> >  >
> >  >
> >  > For ARM Arch, during boot time, it should add processor as if no
> >  > hotplug exists. But later, in context to the (fake) hotplug trigger
> >  > from the virtualizer as a result of the CPU (un)plug action  it should just
> >  end up in registering the already present CPU with the Linux Driver Model.
> >
> >  So let me repeat this last time: acpi_processor_add() cannot do that,
> >  because (as defined today) it rejects CPUs with the "enabled" bit clear in  _STA.
>
>
> I understand that now because you have placed a check recently. sorry for stretching
> it a bit but I wanted to clearly understand the reason for this behavior. Is it because,
>
> 1. It does not makes sense to add a disabled but present/functional processor or
>     perhaps there are repercussions to support such a behavior?

Yes because it is unusable.

> Or
>
> 2. None of the existing processors need such a behavior?
>
>
>
> >  > >  Do you want the per-CPU structures etc. to be created from the
> >  > >  acpi_processor_add() path?
> >  >
> >  >
> >  > I referred to the components related to ARM CPU Arch like redistributors etc.
> >  > which will get initialized in context to Arch specific _init code not
> >  > here. This i.e. acpi_processor_add() is arch agnostic code common to all architectures.
> >  >
> >  > [ A digression: You do have _weak functions which can be overridden to
> >  > arch specific  handling like  arch_(un)map_cpu() etc. but we can't use
> >  > those to defer initialize  the CPU related components - ARM Arch
> >  > constraint!]
> >
> >  Not right now, but they can be added I suppose.
> >
> >  >
> >  > >
> >  > >  This plain won't work because acpi_processor_add(), as defined in
> >  > > the  mainline kernel today (and the Jonathan's patches don't change
> >  > > that  AFAICS), returns an error for processor devices with the
> >  > > "enabled" bit clear  in _STA (it returns an error if the "present"
> >  > > bit is clear too, but that's  obvious), so it only gets to calling
> >  > > arch_register_cpu() if
> >  > >  *both* "present" and "enabled" _STA bits are set for the given
> >  > > processor  device.
> >  >
> >  >
> >  > If you are referring to the _STA check in the XX_hot_add_init() then
> >  > in the current kernel code it only checks for the
> >  > ACPI_STA_DEVICE_PRESENT flag and not the ACPI_STA_DEVICE_ENABLED flag(?).
> >
> >  No, I am not.  I'm referring to this code in 6.9-rc4:
> >
> >  static int acpi_processor_add(struct acpi_device *device,
> >                      const struct acpi_device_id *id) {
> >      struct acpi_processor *pr;
> >      struct device *dev;
> >      int result = 0;
> >
> >      if (!acpi_device_is_enabled(device))
> >          return -ENODEV;
>
>
> Ahh, sorry, I missed this check as this has been added recently. Yes, now your
> logic of having common legs makes more sense.
>
>
> >
> >      ...
> >  }
> >
> >  where acpi_device_is_enabled() is defined as follows:
> >
> >  bool acpi_device_is_enabled(const struct acpi_device *adev) {
> >      return adev->status.present && adev->status.enabled; }
>
>
> Got it.
>
>
> [digression note:]
> BTW, I'm wondering why we are checking adev->status.present
> as having adev->status.enabled as set and adev->status.present
> as unset would mean firmware has a BUG. If we really want to
> check this then we should rather flag a warning on detection
> of this condition?

Adding a warning would be fine with me.

> Either this:
>  bool acpi_device_is_enabled(const struct acpi_device *adev) {
>
>      if (!acpi_device_is_present(adev)) {
>             if (adev->status.enabled)
>                        pr_debug("Device [%s] status inconsistent: Enabled but not Present\n",
>                                           device->pnp.bus_id);
>             return false;
>      }
>       return  true;
> }
>
> Ideally this inconsistency should have been checked in acpi_bus_get_status()
> and above function should have been just,

Yes, it can be added there.  It can even clear 'enabled' if 'present' is clear.

> file: drivers/acpi/scan.c
> bool acpi_device_is_enabled(const struct acpi_device *adev) {
>       return !!adev->status.enabled; }

Sure.

> file: drivers/acpi/bus.c
> int acpi_bus_get_status(struct acpi_device *device)
> {
>        [...]
>
>         status = acpi_bus_get_status_handle(device->handle, &sta);
>         if (ACPI_FAILURE(status))
>                 return -ENODEV;
>
>         acpi_set_device_status(device, sta);
>
>         if (device->status.functional && !device->status.present) {
>                 pr_debug("Device [%s] status [%08x]: functional but not present\n",
>                          device->pnp.bus_id, (u32)sta);
>         }
>
> +       if (device->status.enabled && !device->status.present) {
> +               pr_debug("BUG: Device [%s] status [%08x]: enabled but not present\n",
> +                        device->pnp.bus_id, (u32)sta);
> +                         /* any specific handling here? */
> +       }
>
>         pr_debug("Device [%s] status [%08x]\n", device->pnp.bus_id, (u32)sta);
>         return 0;
> }
>
> >
> >  > The code being reviewed has changed
> >  > exactly that behavior for 2 legs i.e. make-present and make-enabled legs.
> >
> >  I'm not sure what you mean here, but the code above means that
> >  acpi_processor_add) does not distinguish between CPU with the "present"
> >  bit clear (in which case the "enabled" bit must also be clear as per the spec)
> >  and CPUs with the "present" bit set and the "enabled" bit clear.  These two
> >  cases are handled in the same way.
> >
> >  > I'm open to further address your point clearly.
> >
> >  I hope that the above is clear enough.
>
>
> Yes, clear now. I missed the new check.
>
> >
> >  > >
> >  > >  That, BTW, is why I keep saying that from the ACPI CPU enumeration
> >  > > code  perspective, there is no difference between "present" and
> >  "enabled".
> >  >
> >  >
> >  > Agreed but there is still a subtle difference.  Enumeration happens
> >  > once and for all the processors during the boot time. And as confirmed
> >  > by yourself and Thomas as well that even in x86 arch all the
> >  > processors will be discovered and their topology fixed during the boot
> >  > time which is effectively the same behavior as in the ARM Arch. But
> >  > ARM assumes those 'present' bits in the present masks to be set during
> >  > the boot time which is not like x86(?).  Hence, 'present cpu' Bits
> >  > will always be equal to 'possible cpu' Bits. This is a constraint put by the
> >  ARM maintainers and looks unshakable.
> >
> >  Yes, there are differences between architectures, but the ACPI code is (or
> >  at least should be) architecture-agnostic (as you said somewhere above).
> >  So why does this matter for the ACPI code?
>
>
> It should not. There were few bits like overriding of arch_register_cpu() which
> was not allowed by ARM folks in 2020 when I floated the first prototype.
>
>
> >  > >  > 2.  Agreed regarding the topology. Are you suggesting that we
> >  > > must  > call arch_register_cpu() during boot time for all the 'present'  CPUs?
> >  > >  > Even if that's the case, we might still want to defer
> >  > > registration of  > the cpu device (register_cpu() API) with the
> >  > > Linux device model. Later  > is what we are doing to hide/unhide the
> >  > > CPUs from the user while  STA.Enabled Bit is toggled due to CPU  (un)plug action.
> >  > >
> >  > >  There are two ways to approach this IMV, and both seem to be valid
> >  > > in  principle.
> >  > >
> >  > >  One would be to treat CPUs with the "enabled" bit clear as not
> >  > > present and  create all of the requisite data structures for them
> >  > > when they become  available (in analogy with the "real hot-add" case).
> >  >
> >  >
> >  > Right. This one is out-of-scope for ARM Arch as we cannot defer any
> >  > Arch specific sizing and initializations after boot i.e. when
> >  > processor_add() gets called again later as a trigger of CPU plug action at the Virtualizer.
> >  >
> >  >
> >  > >
> >  > >  The alternative one is to create all of the requisite data
> >  > > structures for the  CPUs that you find during boot, but register CPU
> >  > > devices for those having  the "enabled" _STA bit set.
> >  >
> >  >
> >  > Correct. and we defer the registration for CPUs with online-capable
> >  > Bit set in the ACPI MADT/GICC data structure. These CPUs basically
> >  > form set of hot-pluggable CPUs on ARM.
> >  >
> >  >
> >  > >
> >  > >  It looks like you have chosen the second approach, which is fine
> >  > > with me  (although personally, I would rather choose the first one),
> >  > > but then your  arch code needs to arrange for the requisite CPU data
> >  > > structures etc. to be  set up before acpi_processor_add() gets
> >  > > called because, as per the above,  the latter just rejects CPUs with the  "enabled" _STA bit clear.
> >  >
> >  > Yes, correct. First one is not possible - at least for now and to have
> >  > that it will require lot of rework especially at GIC. But there are
> >  > many other arch components (like timers, PMUs, etc.) whose behavior
> >  > needs to be specified somewhere in the architecture as well. All these are closely coupled with the ARM CPU architecture.
> >  > (it's beyond this discussion and lets leave that to ARM folks).
> >  >
> >  > This patch-set has a change to deal with ACPI _STA.Enabled Bit accordingly.
> >
> >  Well, I'm having a hard time with this.
> >
> >  As far as CPU enumeration goes, if the "enabled" bit is clear in _STA, it does
> >  not happen at all.  Both on ARM and on x86.
>
> sure, I can see that now.
>
> >
> >  Now tell me why there need to be two separate code paths calling
> >  arch_register_cpu() in acpi_processor_add()?
>
>
> As mentioned above, in the first prototype I floated in the year 2020 any attempts
> to override the __weak call of arch_register_cpu() for ARM64 was discouraged.
> Though, the reasons might have changed now as some code has been moved.
>
> Once we are allowed to override the calls then there are many more possibilities
> which open up to simplify the code further.

Well, IMV this should just be an arch function with no __weak
defaults, because the default would probably be unusable in practice
anyway.