[PATCH v6 3/3] ARM: Check if a CPU has gone offline

Rob Herring robherring2 at gmail.com
Thu Apr 24 08:11:20 PDT 2014


On Thu, Apr 24, 2014 at 9:33 AM, Ashwin Chaugule
<ashwin.chaugule at linaro.org> wrote:
> On 18 April 2014 11:21, Ashwin Chaugule <ashwin.chaugule at linaro.org> wrote:
>> Hi Mark,
>>
>>
>> On 17 April 2014 15:50, Mark Rutland <mark.rutland at arm.com> wrote:
>>> On Thu, Apr 17, 2014 at 08:15:46PM +0100, Ashwin Chaugule wrote:
>>>> PSCIv0.2 adds a new function called AFFINITY_INFO, which
>>>> can be used to query if a specified CPU has actually gone
>>>> offline. Calling this function via cpu_kill ensures that
>>>> a CPU has quiesced after a call to cpu_die.

[...]

>>> We can race with the dying CPU here -- if we call AFFINITY_INFO before
>>> the dying cpu is sufficiently far through its CPU_OFF call it won't
>>> register as OFF.
>>>
>>> Could we poll here instead (with a reasonable limit on the number of
>>> iterations)? That would enable us to not spuriously declare a CPU to be
>>> dead when it happened to take slightly longer than we expect to turn
>>> off.
>>
>> True. How about something like this?
>>
>>  int __ref psci_cpu_kill(unsigned int cpu)
>>  {
>> -       int err;
>> +       int err, retries;
>>
>>         if (!psci_ops.affinity_info)
>>                 return 1;
>> -
>> +       /*
>> +        * cpu_kill could race with cpu_die and we can
>> +        * potentially end up declaring this cpu undead
>> +        * while it is dying. So retry a couple of times.
>> +        */
>> +retry:
>>         err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
>>
>>         if (err != PSCI_AFFINITY_INFO_RET_OFF) {
>> +               if (++retries < 3) {
>> +                       pr_info("Retrying check for CPU kill: %d\n", retries);
>> +                       goto retry;
>> +               }
>>                 pr_err("psci: Cannot kill CPU:%d, psci ret val: %d\n",
>>                                 cpu, err);
>>                 /* Make platform_cpu_kill() fail. */
>>
>>
>>
>
>
> Hi Rob, I've already got your Reviewed-by on this patch without this
> "retry" thing. Are you okay with this as well? I can then roll it up
> in one patch.

Yes. My only comment is I would perhaps add a sleep (or delay if this
context cannot sleep) on the retry. I'm not sure what I reasonable
time would be, but at least then you are waiting a defined amount of
time versus how long it takes this code to execute.

Rob



More information about the linux-arm-kernel mailing list