[PATCH V4 2/3] arm64/mm/hotplug: Enable MEM_OFFLINE event handling
Gavin Shan
gshan at redhat.com
Sun Oct 11 23:27:41 EDT 2020
Hi Anshuman,
On 10/6/20 1:59 PM, Anshuman Khandual wrote:
> On 10/01/2020 05:27 AM, Gavin Shan wrote:
>> On 9/29/20 11:54 PM, Anshuman Khandual wrote:
>>> This enables MEM_OFFLINE memory event handling. It will help intercept any
>>> possible error condition such as if boot memory some how still got offlined
>>> even after an explicit notifier failure, potentially by a future change in
>>> generic hot plug framework. This would help detect such scenarios and help
>>> debug further. While here, also call out the first section being attempted
>>> for offline or got offlined.
>>>
>>> Cc: Catalin Marinas <catalin.marinas at arm.com>
>>> Cc: Will Deacon <will at kernel.org>
>>> Cc: Mark Rutland <mark.rutland at arm.com>
>>> Cc: Marc Zyngier <maz at kernel.org>
>>> Cc: Steve Capper <steve.capper at arm.com>
>>> Cc: Mark Brown <broonie at kernel.org>
>>> Cc: linux-arm-kernel at lists.infradead.org
>>> Cc: linux-kernel at vger.kernel.org
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual at arm.com>
>>> ---
>>> arch/arm64/mm/mmu.c | 29 +++++++++++++++++++++++++++--
>>> 1 file changed, 27 insertions(+), 2 deletions(-)
>>>
>>
>> This looks good to me except a nit and it can be improved if
>> that looks reasonable and only when you get a chance for
>> respin.
>>
>> Reviewed-by: Gavin Shan <gshan at redhat.com>
>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 4e70f4fea06c..90a30f5ebfc0 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1482,13 +1482,38 @@ static int prevent_bootmem_remove_notifier(struct notifier_block *nb,
>>> unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
>>> unsigned long pfn = arg->start_pfn;
>>> - if (action != MEM_GOING_OFFLINE)
>>> + if ((action != MEM_GOING_OFFLINE) && (action != MEM_OFFLINE))
>>> return NOTIFY_OK;
>>> for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>>> + unsigned long start = PFN_PHYS(pfn);
>>> + unsigned long end = start + (1UL << PA_SECTION_SHIFT);
>>> +
>>> ms = __pfn_to_section(pfn);
>>> - if (early_section(ms))
>>> + if (!early_section(ms))
>>> + continue;
>>> +
>>
>> The discussion here is irrelevant to this patch itself. It seems
>> early_section() is coarse, which means all memory detected during
>> boot time won't be hotpluggable?
>
> Right, thats the policy being enforced on arm64 platform for various
> critical reasons. Please refer to earlier discussions around memory
> hot remove development on arm64.
>
Thanks for the hints.
>>
>>> + if (action == MEM_GOING_OFFLINE) {
>>> + pr_warn("Boot memory [%lx %lx] offlining attempted\n", start, end);
>>> return NOTIFY_BAD;
>>> + } else if (action == MEM_OFFLINE) {
>>> + /*
>>> + * This should have never happened. Boot memory
>>> + * offlining should have been prevented by this
>>> + * very notifier. Probably some memory removal
>>> + * procedure might have changed which would then
>>> + * require further debug.
>>> + */
>>> + pr_err("Boot memory [%lx %lx] offlined\n", start, end);
>>> +
>>> + /*
>>> + * Core memory hotplug does not process a return
>>> + * code from the notifier for MEM_OFFLINE event.
>>> + * Error condition has been reported. Report as
>>> + * ignored.
>>> + */
>>> + return NOTIFY_DONE;
>>> + }
>>> }
>>> return NOTIFY_OK;
>>> }
>>>
>>
>> I think NOTIFY_BAD is returned for MEM_OFFLINE wouldn't be a
>> bad idea, even the core isn't handling the errno. With this,
>> the code can be simplified. However, it's not a big deal and
>> you probably evaluate and change when you need another respin:
>>
>> pr_warn("Boot memory [%lx %lx] %s\n",
>> (action == MEM_GOING_OFFLINE) ? "offlining attempted" : "offlined",
>> start, end);
>> return NOTIFY_BAD;
>
> Wondering whether returning a NOTIFY_BAD for MEM_OFFLINE event could
> be somewhat risky if generic hotplug mechanism to change later. But
> again, probably it might just be OK.
>
> Regardless, also wanted to differentiate error messages for both the
> cases. An warning messages i.e pr_warn() for MEM_GOING_OFFLINE which
> suggests an unexpected user action but an error message i.e pr_err()
> for MEM_OFFLINE which clearly indicates an error condition that needs
> to be debugged further.
>
Ok, fair enough and it looks good to me either.
Cheers,
Gavin
More information about the linux-arm-kernel
mailing list