[PATCH V4 3/3] arm64/mm/hotplug: Ensure early memory sections are all online

Gavin Shan gshan at redhat.com
Mon Oct 12 00:07:51 EDT 2020


Hi Anshuman,

On 10/6/20 2:11 PM, Anshuman Khandual wrote:
> On 10/01/2020 06:23 AM, Gavin Shan wrote:
>> On 9/29/20 11:54 PM, Anshuman Khandual wrote:
>>> This adds a validation function that scans the entire boot memory and makes
>>> sure that all early memory sections are online. This check is essential for
>>> the memory notifier to work properly, as it cannot prevent any boot memory
>>> from offlining, if all sections are not online to begin with. The notifier
>>> registration is skipped, if this validation does not go through. Although
>>> the boot section scanning is selectively enabled with DEBUG_VM.
>>>
>>> Cc: Catalin Marinas <catalin.marinas at arm.com>
>>> Cc: Will Deacon <will at kernel.org>
>>> Cc: Mark Rutland <mark.rutland at arm.com>
>>> Cc: Marc Zyngier <maz at kernel.org>
>>> Cc: Steve Capper <steve.capper at arm.com>
>>> Cc: Mark Brown <broonie at kernel.org>
>>> Cc: linux-arm-kernel at lists.infradead.org
>>> Cc: linux-kernel at vger.kernel.org
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual at arm.com>
>>> ---
>>>    arch/arm64/mm/mmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 59 insertions(+)
>>
>> I don't understand why this is necessary. The core already ensure the
>> corresponding section is online when trying to offline it. It's guranteed
>> that section is online when the notifier is triggered. I'm not sure if
>> there is anything I missed?
> 
> Current memory notifier blocks any boot memory hot removal attempt via
> blocking its offlining step itself. So if some sections in boot memory
> are not online (because of a bug or change in init sequence) by the
> time memory block device can be removed, the notifier loses the ability
> to prevent its removal. This validation here, ensures that entire boot
> memory is in online state, otherwise call out sections that are not,
> with an warning that those boot memory can be removed.
> 

Well. I think it should be very rare. I guess you don't observe the
errornous case so far? However, I think it's fine to add the check
since it's only enabled with CONFIG_DEBUG_VM.

>>   
>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 90a30f5ebfc0..b67a657ea1ad 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1522,6 +1522,62 @@ static struct notifier_block prevent_bootmem_remove_nb = {
>>>        .notifier_call = prevent_bootmem_remove_notifier,
>>>    };
>>>    +/*
>>> + * This ensures that boot memory sections on the plaltform are online
> 
> Will fix.
> 
>>                                                      ^^^^^^^^^
>>> + * during early boot. They could not be prevented from being offlined
>>> + * if for some reason they are not brought online to begin with. This
>>> + * help validate the basic assumption on which the above memory event
>>> + * notifier works to prevent boot memory offlining and it's possible
>>> + * removal.
>>> + */
>>> +static bool validate_bootmem_online(void)
>>> +{
>>> +    struct memblock_region *mblk;
>>> +    struct mem_section *ms;
>>> +    unsigned long pfn, end_pfn, start, end;
>>> +    bool all_online = true;
>>> +
>>> +    /*
>>> +     * Scanning across all memblock might be expensive
>>> +     * on some big memory systems. Hence enable this
>>> +     * validation only with DEBUG_VM.
>>> +     */
>>> +    if (!IS_ENABLED(CONFIG_DEBUG_VM))
>>> +        return all_online;
>>> +
>>> +    for_each_memblock(memory, mblk) {
>>> +        pfn = PHYS_PFN(mblk->base);
>>> +        end_pfn = PHYS_PFN(mblk->base + mblk->size);
>>> +
>>
>> It's not a good idea to access @mblk->{base, size}. There are two
>> accessors: memblock_region_memory_{base, end}_pfn().
> 
> Sure, will replace.
> 
>>
>>> +        for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>>> +            ms = __pfn_to_section(pfn);
>>> +
>>> +            /*
>>> +             * All memory ranges in the system at this point
>>> +             * should have been marked early sections.
>>> +             */
>>> +            WARN_ON(!early_section(ms));
>>> +
>>> +            /*
>>> +             * Memory notifier mechanism here to prevent boot
>>> +             * memory offlining depends on the fact that each
>>> +             * early section memory on the system is intially
>>> +             * online. Otherwise a given memory section which
>>> +             * is already offline will be overlooked and can
>>> +             * be removed completely. Call out such sections.
>>> +             */
>>
>> s/intially/initially
> 
> Will change.
> 
>>
>>> +            if (!online_section(ms)) {
>>> +                start = PFN_PHYS(pfn);
>>> +                end = start + (1UL << PA_SECTION_SHIFT);
>>> +                pr_err("Memory range [%lx %lx] is offline\n", start, end);
>>> +                pr_err("Memory range [%lx %lx] can be removed\n", start, end);
>>> +                all_online = false;
>>
>> These two error messages can be combined:
>>
>>      pr_err("Memory range [%lx %lx] not online, can't be offlined\n",
>>             start, end);
> 
> Will change but it is actually s/can't be offlined/can be removed/ instead.
> 
>>
>> I think you need to return @all_online immediately, without
>> checking if the subsequent sections are online or not? :)
> 
> Thinking about this again. It might be better if the notifier registration
> does not depend on return value from validate_bootmem_online(). Instead it
> should proceed either way but after calling out all boot memory sections
> that are not online. In that case notifier will atleast prevent removal of
> some parts of boot memory which are online.
> 

Yes, agreed. However, the most important part is to print the errornous
messages introduced in validate_bootmem_online().

Cheers,
Gavin





More information about the linux-arm-kernel mailing list