[PATCH V4 3/3] arm64/mm/hotplug: Ensure early memory sections are all online
Gavin Shan
gshan at redhat.com
Mon Oct 12 00:07:51 EDT 2020
Hi Anshuman,
On 10/6/20 2:11 PM, Anshuman Khandual wrote:
> On 10/01/2020 06:23 AM, Gavin Shan wrote:
>> On 9/29/20 11:54 PM, Anshuman Khandual wrote:
>>> This adds a validation function that scans the entire boot memory and makes
>>> sure that all early memory sections are online. This check is essential for
>>> the memory notifier to work properly, as it cannot prevent any boot memory
>>> from offlining, if all sections are not online to begin with. The notifier
>>> registration is skipped, if this validation does not go through. Although
>>> the boot section scanning is selectively enabled with DEBUG_VM.
>>>
>>> Cc: Catalin Marinas <catalin.marinas at arm.com>
>>> Cc: Will Deacon <will at kernel.org>
>>> Cc: Mark Rutland <mark.rutland at arm.com>
>>> Cc: Marc Zyngier <maz at kernel.org>
>>> Cc: Steve Capper <steve.capper at arm.com>
>>> Cc: Mark Brown <broonie at kernel.org>
>>> Cc: linux-arm-kernel at lists.infradead.org
>>> Cc: linux-kernel at vger.kernel.org
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual at arm.com>
>>> ---
>>> arch/arm64/mm/mmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 59 insertions(+)
>>
>> I don't understand why this is necessary. The core already ensure the
>> corresponding section is online when trying to offline it. It's guranteed
>> that section is online when the notifier is triggered. I'm not sure if
>> there is anything I missed?
>
> Current memory notifier blocks any boot memory hot removal attempt via
> blocking its offlining step itself. So if some sections in boot memory
> are not online (because of a bug or change in init sequence) by the
> time memory block device can be removed, the notifier loses the ability
> to prevent its removal. This validation here, ensures that entire boot
> memory is in online state, otherwise call out sections that are not,
> with an warning that those boot memory can be removed.
>
Well. I think it should be very rare. I guess you don't observe the
errornous case so far? However, I think it's fine to add the check
since it's only enabled with CONFIG_DEBUG_VM.
>>
>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 90a30f5ebfc0..b67a657ea1ad 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1522,6 +1522,62 @@ static struct notifier_block prevent_bootmem_remove_nb = {
>>> .notifier_call = prevent_bootmem_remove_notifier,
>>> };
>>> +/*
>>> + * This ensures that boot memory sections on the plaltform are online
>
> Will fix.
>
>> ^^^^^^^^^
>>> + * during early boot. They could not be prevented from being offlined
>>> + * if for some reason they are not brought online to begin with. This
>>> + * help validate the basic assumption on which the above memory event
>>> + * notifier works to prevent boot memory offlining and it's possible
>>> + * removal.
>>> + */
>>> +static bool validate_bootmem_online(void)
>>> +{
>>> + struct memblock_region *mblk;
>>> + struct mem_section *ms;
>>> + unsigned long pfn, end_pfn, start, end;
>>> + bool all_online = true;
>>> +
>>> + /*
>>> + * Scanning across all memblock might be expensive
>>> + * on some big memory systems. Hence enable this
>>> + * validation only with DEBUG_VM.
>>> + */
>>> + if (!IS_ENABLED(CONFIG_DEBUG_VM))
>>> + return all_online;
>>> +
>>> + for_each_memblock(memory, mblk) {
>>> + pfn = PHYS_PFN(mblk->base);
>>> + end_pfn = PHYS_PFN(mblk->base + mblk->size);
>>> +
>>
>> It's not a good idea to access @mblk->{base, size}. There are two
>> accessors: memblock_region_memory_{base, end}_pfn().
>
> Sure, will replace.
>
>>
>>> + for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>>> + ms = __pfn_to_section(pfn);
>>> +
>>> + /*
>>> + * All memory ranges in the system at this point
>>> + * should have been marked early sections.
>>> + */
>>> + WARN_ON(!early_section(ms));
>>> +
>>> + /*
>>> + * Memory notifier mechanism here to prevent boot
>>> + * memory offlining depends on the fact that each
>>> + * early section memory on the system is intially
>>> + * online. Otherwise a given memory section which
>>> + * is already offline will be overlooked and can
>>> + * be removed completely. Call out such sections.
>>> + */
>>
>> s/intially/initially
>
> Will change.
>
>>
>>> + if (!online_section(ms)) {
>>> + start = PFN_PHYS(pfn);
>>> + end = start + (1UL << PA_SECTION_SHIFT);
>>> + pr_err("Memory range [%lx %lx] is offline\n", start, end);
>>> + pr_err("Memory range [%lx %lx] can be removed\n", start, end);
>>> + all_online = false;
>>
>> These two error messages can be combined:
>>
>> pr_err("Memory range [%lx %lx] not online, can't be offlined\n",
>> start, end);
>
> Will change but it is actually s/can't be offlined/can be removed/ instead.
>
>>
>> I think you need to return @all_online immediately, without
>> checking if the subsequent sections are online or not? :)
>
> Thinking about this again. It might be better if the notifier registration
> does not depend on return value from validate_bootmem_online(). Instead it
> should proceed either way but after calling out all boot memory sections
> that are not online. In that case notifier will atleast prevent removal of
> some parts of boot memory which are online.
>
Yes, agreed. However, the most important part is to print the errornous
messages introduced in validate_bootmem_online().
Cheers,
Gavin
More information about the linux-arm-kernel
mailing list