[PATCH v6 3/3] ARM: Check if a CPU has gone offline

Ashwin Chaugule ashwin.chaugule at linaro.org
Fri Apr 18 08:21:22 PDT 2014


Hi Mark,


On 17 April 2014 15:50, Mark Rutland <mark.rutland at arm.com> wrote:
> On Thu, Apr 17, 2014 at 08:15:46PM +0100, Ashwin Chaugule wrote:
>> PSCIv0.2 adds a new function called AFFINITY_INFO, which
>> can be used to query if a specified CPU has actually gone
>> offline. Calling this function via cpu_kill ensures that
>> a CPU has quiesced after a call to cpu_die.
>>
>> Signed-off-by: Ashwin Chaugule <ashwin.chaugule at linaro.org>
>> Reviewed-by: Rob Herring <robh at kernel.org>
>> ---
>>  arch/arm/kernel/psci_smp.c | 21 +++++++++++++++++++++
>>  include/uapi/linux/psci.h  |  5 +++++
>>  2 files changed, 26 insertions(+)
>>
>> diff --git a/arch/arm/kernel/psci_smp.c b/arch/arm/kernel/psci_smp.c
>> index 570a48c..c6f1420 100644
>> --- a/arch/arm/kernel/psci_smp.c
>> +++ b/arch/arm/kernel/psci_smp.c
>> @@ -16,6 +16,7 @@
>>  #include <linux/init.h>
>>  #include <linux/smp.h>
>>  #include <linux/of.h>
>> +#include <uapi/linux/psci.h>
>>
>>  #include <asm/psci.h>
>>  #include <asm/smp_plat.h>
>> @@ -66,6 +67,25 @@ void __ref psci_cpu_die(unsigned int cpu)
>>         /* We should never return */
>>         panic("psci: cpu %d failed to shutdown\n", cpu);
>>  }
>> +
>> +int __ref psci_cpu_kill(unsigned int cpu)
>> +{
>> +     int err;
>> +
>> +     if (!psci_ops.affinity_info)
>> +             return 1;
>> +
>> +     err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
>> +
>> +     if (err != PSCI_AFFINITY_INFO_RET_OFF) {
>> +             pr_err("psci: Cannot kill CPU:%d, psci ret val: %d\n",
>> +                             cpu, err);
>> +             /* Make platform_cpu_kill() fail. */
>> +             return 0;
>> +     }
>
> We can race with the dying CPU here -- if we call AFFINITY_INFO before
> the dying cpu is sufficiently far through its CPU_OFF call it won't
> register as OFF.
>
> Could we poll here instead (with a reasonable limit on the number of
> iterations)? That would enable us to not spuriously declare a CPU to be
> dead when it happened to take slightly longer than we expect to turn
> off.

True. How about something like this?

 int __ref psci_cpu_kill(unsigned int cpu)
 {
-       int err;
+       int err, retries;

        if (!psci_ops.affinity_info)
                return 1;
-
+       /*
+        * cpu_kill could race with cpu_die and we can
+        * potentially end up declaring this cpu undead
+        * while it is dying. So retry a couple of times.
+        */
+retry:
        err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);

        if (err != PSCI_AFFINITY_INFO_RET_OFF) {
+               if (++retries < 3) {
+                       pr_info("Retrying check for CPU kill: %d\n", retries);
+                       goto retry;
+               }
                pr_err("psci: Cannot kill CPU:%d, psci ret val: %d\n",
                                cpu, err);
                /* Make platform_cpu_kill() fail. */



Cheers,
Ashwin



More information about the linux-arm-kernel mailing list