[PATCH V4 2/3] arm64/mm/hotplug: Enable MEM_OFFLINE event handling

Gavin Shan gshan at redhat.com
Sun Oct 11 23:27:41 EDT 2020


Hi Anshuman,

On 10/6/20 1:59 PM, Anshuman Khandual wrote:
> On 10/01/2020 05:27 AM, Gavin Shan wrote:
>> On 9/29/20 11:54 PM, Anshuman Khandual wrote:
>>> This enables MEM_OFFLINE memory event handling. It will help intercept any
>>> possible error condition such as if boot memory some how still got offlined
>>> even after an explicit notifier failure, potentially by a future change in
>>> generic hot plug framework. This would help detect such scenarios and help
>>> debug further. While here, also call out the first section being attempted
>>> for offline or got offlined.
>>>
>>> Cc: Catalin Marinas <catalin.marinas at arm.com>
>>> Cc: Will Deacon <will at kernel.org>
>>> Cc: Mark Rutland <mark.rutland at arm.com>
>>> Cc: Marc Zyngier <maz at kernel.org>
>>> Cc: Steve Capper <steve.capper at arm.com>
>>> Cc: Mark Brown <broonie at kernel.org>
>>> Cc: linux-arm-kernel at lists.infradead.org
>>> Cc: linux-kernel at vger.kernel.org
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual at arm.com>
>>> ---
>>>    arch/arm64/mm/mmu.c | 29 +++++++++++++++++++++++++++--
>>>    1 file changed, 27 insertions(+), 2 deletions(-)
>>>
>>
>> This looks good to me except a nit and it can be improved if
>> that looks reasonable and only when you get a chance for
>> respin.
>>
>> Reviewed-by: Gavin Shan <gshan at redhat.com>
>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 4e70f4fea06c..90a30f5ebfc0 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1482,13 +1482,38 @@ static int prevent_bootmem_remove_notifier(struct notifier_block *nb,
>>>        unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
>>>        unsigned long pfn = arg->start_pfn;
>>>    -    if (action != MEM_GOING_OFFLINE)
>>> +    if ((action != MEM_GOING_OFFLINE) && (action != MEM_OFFLINE))
>>>            return NOTIFY_OK;
>>>          for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>>> +        unsigned long start = PFN_PHYS(pfn);
>>> +        unsigned long end = start + (1UL << PA_SECTION_SHIFT);
>>> +
>>>            ms = __pfn_to_section(pfn);
>>> -        if (early_section(ms))
>>> +        if (!early_section(ms))
>>> +            continue;
>>> +
>>
>> The discussion here is irrelevant to this patch itself. It seems
>> early_section() is coarse, which means all memory detected during
>> boot time won't be hotpluggable?
> 
> Right, thats the policy being enforced on arm64 platform for various
> critical reasons. Please refer to earlier discussions around memory
> hot remove development on arm64.
> 

Thanks for the hints.

>>
>>> +        if (action == MEM_GOING_OFFLINE) {
>>> +            pr_warn("Boot memory [%lx %lx] offlining attempted\n", start, end);
>>>                return NOTIFY_BAD;
>>> +        } else if (action == MEM_OFFLINE) {
>>> +            /*
>>> +             * This should have never happened. Boot memory
>>> +             * offlining should have been prevented by this
>>> +             * very notifier. Probably some memory removal
>>> +             * procedure might have changed which would then
>>> +             * require further debug.
>>> +             */
>>> +            pr_err("Boot memory [%lx %lx] offlined\n", start, end);
>>> +
>>> +            /*
>>> +             * Core memory hotplug does not process a return
>>> +             * code from the notifier for MEM_OFFLINE event.
>>> +             * Error condition has been reported. Report as
>>> +             * ignored.
>>> +             */
>>> +            return NOTIFY_DONE;
>>> +        }
>>>        }
>>>        return NOTIFY_OK;
>>>    }
>>>
>>
>> I think NOTIFY_BAD is returned for MEM_OFFLINE wouldn't be a
>> bad idea, even the core isn't handling the errno. With this,
>> the code can be simplified. However, it's not a big deal and
>> you probably evaluate and change when you need another respin:
>>
>>      pr_warn("Boot memory [%lx %lx] %s\n",
>>              (action == MEM_GOING_OFFLINE) ? "offlining attempted" : "offlined",
>>              start, end);
>>      return NOTIFY_BAD;
> 
> Wondering whether returning a NOTIFY_BAD for MEM_OFFLINE event could
> be somewhat risky if generic hotplug mechanism to change later. But
> again, probably it might just be OK.
> 
> Regardless, also wanted to differentiate error messages for both the
> cases. An warning messages i.e pr_warn() for MEM_GOING_OFFLINE which
> suggests an unexpected user action but an error message i.e pr_err()
> for MEM_OFFLINE which clearly indicates an error condition that needs
> to be debugged further.
> 

Ok, fair enough and it looks good to me either.

Cheers,
Gavin





More information about the linux-arm-kernel mailing list