[PATCH 2/2] arm64/mm: Reject memory removal that splits a kernel leaf mapping

Ryan Roberts ryan.roberts at arm.com
Mon Feb 2 03:35:20 PST 2026


On 02/02/2026 11:06, Anshuman Khandual wrote:
> On 02/02/26 3:12 PM, Ryan Roberts wrote:
>> On 02/02/2026 04:26, Anshuman Khandual wrote:
>>> Linear and vmemmap mapings that get teared down during a memory hot remove
>>> operation might contain leaf level entries on any page table level. If the
>>> requested memory range's linear or vmemmap mappings falls within such leaf
>>> entries, new mappings need to be created for the remaning memory mapped on
>>> the leaf entry earlier, following standard break before make aka BBM rules.
>>
>> I think it would be good to mention that the kernel cannot tolerate BBM so
>> remapping to fine grained leaves would not be possible on systems without
>> BBML2_NOABORT.
> 
> Sure will add that.
> 
>>
>>>
>>> Currently memory hot remove operation does not perform such restructuring,
>>> and so removing memory ranges that could split a kernel leaf level mapping
>>> need to be rejected.
>>
>> Perhaps it is useful to mention that while memory_hotplug.c does appear to
>> permit hot-unplugging arbitrary ranges of memory, the higher layers that drive
>> memory_hotplug (e.g. ACPI, virtio, ...) all appear to treat memory as fixed size
>> devices so it is impossible to hotunplug a different amount than was previously
>> hotplugged, and so we should never see a rejection in practice, but adding the
>> check makes us robust against a future change.
> 
> Agreed, will update the commit message.
> 
>>
>>>
>>> Cc: Catalin Marinas <catalin.marinas at arm.com>
>>> Cc: Will Deacon <will at kernel.org>
>>> Cc: linux-arm-kernel at lists.infradead.org
>>> Cc: linux-kernel at vger.kernel.org
>>> Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
>>> Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
>>> Cc: stable at vger.kernel.org
>>> Suggested-by: Ryan Roberts <ryan.roberts at arm.com>
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual at arm.com>
>>> ---
>>>  arch/arm64/mm/mmu.c | 126 ++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 126 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 8ec8a287aaa1..9d59e10fb3de 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -2063,6 +2063,129 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
>>>  	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
>>>  }
>>>  
>>> +
>>> +static bool split_kernel_leaf_boundary(unsigned long addr)
>>
>> The name currently makes it sound like we are asking for the mapping to be split
>> (we have existing functions to do this that are named similarly). Perhaps a
>> better name would be addr_splits_leaf()?
> 
> Agreed that name sounds bit confusing and ambiguous as there is already
> a similarly named function. Will rename it as addr_splits_kernel_leaf()
> instead.
> 
>>
>>> +{
>>> +	pgd_t *pgdp, pgd;
>>> +	p4d_t *p4dp, p4d;
>>> +	pud_t *pudp, pud;
>>> +	pmd_t *pmdp, pmd;
>>> +	pte_t *ptep, pte;
>>> +
>>> +	/*
>>> +	 * PGD: If addr is PGD aligned then addr already
>>> +	 * describes a leaf boundary.
>>> +	 */
>>> +	if (ALIGN_DOWN(addr, PGDIR_SIZE) == addr)
>>> +		return false;
>>> +
>>> +	pgdp = pgd_offset_k(addr);
>>> +	pgd = pgdp_get(pgdp);
>>> +	if (!pgd_present(pgd))
>>> +		return false;
>>> +
>>> +	/*
>>> +	 * P4D: If addr is P4D aligned then addr already
>>> +	 * describes a leaf boundary.
>>> +	 */
>>> +	if (ALIGN_DOWN(addr, P4D_SIZE) == addr)
>>> +		return false;
>>> +
>>> +	p4dp = p4d_offset(pgdp, addr);
>>> +	p4d = p4dp_get(p4dp);
>>> +	if (!p4d_present(p4d))
>>> +		return false;
>>> +
>>> +	/*
>>> +	 * PUD: If addr is PUD aligned then addr already
>>> +	 * describes a leaf boundary.
>>> +	 */
>>> +	if (ALIGN_DOWN(addr, PUD_SIZE) == addr)
>>> +		return false;
>>> +
>>> +	pudp = pud_offset(p4dp, addr);
>>> +	pud = pudp_get(pudp);
>>> +	if (!pud_present(pud))
>>> +		return false;
>>> +
>>> +	if (pud_leaf(pud))
>>> +		return true;
>>> +
>>> +	/*
>>> +	 * CONT_PMD: If addr is CONT_PMD aligned then
>>> +	 * addr already describes a leaf boundary.
>>> +	 */
>>> +	if (ALIGN_DOWN(addr, CONT_PMD_SIZE) == addr)
>>> +		return false;
>>> +
>>> +	pmdp = pmd_offset(pudp, addr);
>>> +	pmd = pmdp_get(pmdp);
>>> +	if (!pmd_present(pmd))
>>> +		return false;
>>> +
>>> +	if (pmd_leaf(pmd) && pmd_cont(pmd))
>>> +		return true;
>>> +
>>> +	/*
>>> +	 * PMD: If addr is PMD aligned then addr already
>>> +	 * describes a leaf boundary.
>>> +	 */
>>> +	if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
>>> +		return false;
>>> +
>>> +	if (pmd_leaf(pmd))
>>> +		return true;
>>> +
>>> +	/*
>>> +	 * CONT_PTE: If addr is CONT_PTE aligned then addr
>>> +	 * already describes a leaf boundary.
>>> +	 */
>>> +	if (ALIGN_DOWN(addr, CONT_PTE_SIZE) == addr)
>>> +		return false;
>>> +
>>> +	ptep = pte_offset_kernel(pmdp, addr);
>>> +	pte = __ptep_get(ptep);
>>> +	if (!pte_present(pte))
>>> +		return false;
>>> +
>>> +	if (pte_valid(pte) && pte_cont(pte))
>>
>> Why do you need pte_valid() here? You have already checked !pte_present(). Are
>> you expecting a case of present but not valid (PTE_PRESENT_INVALID)? If so, do
>> you need to consider that for the other levels too? (pmd_leaf() only checks
>> pmd_present()).
>>> Personally I think you can just drop the pte_valid() check here.
> 
> Added pte_valid() for abundance of caution but it is not really
> necessary though. Sure will drop it off.
> 
>>
>>> +		return true;
>>> +
>>> +	if (ALIGN_DOWN(addr, PAGE_SIZE) == addr)
>>> +		return false;
>>> +	return true;
>>> +}
>>> +
>>> +static bool can_unmap_without_split(unsigned long pfn, unsigned long nr_pages)
>>> +{
>>> +	unsigned long linear_start, linear_end, phys_start, phys_end;
>>> +	unsigned long vmemmap_size, vmemmap_start, vmemmap_end;
>>
>> nit: do we need all these variables. Perhaps just:
>>
>> unsigned long sz, start, end, phys_start, phys_end;
>>
>> are sufficient?
> 
> Alright. I guess start and end can be re-used both for linear and
> vmemmap mapping.
> 
>>
>>> +
>>> +	/* Assert linear map edges do not split a leaf entry */
>>> +	phys_start = PFN_PHYS(pfn);
>>> +	phys_end = phys_start + nr_pages * PAGE_SIZE;
>>> +	linear_start = __phys_to_virt(phys_start);
>>> +	linear_end =  __phys_to_virt(phys_end);
>>> +	if (split_kernel_leaf_boundary(linear_start) ||
>>> +	    split_kernel_leaf_boundary(linear_end)) {
>>> +		pr_warn("[%lx %lx] splits a leaf entry in linear map\n",
>>> +			phys_start, phys_end);
>>> +		return false;
>>> +	}
>>> +
>>> +	/* Assert vmemmap edges do not split a leaf entry */
>>> +	vmemmap_size = nr_pages * sizeof(struct page);
>>> +	vmemmap_start = (unsigned long) pfn_to_page(pfn);
>>
>> nit:                                   ^
>>
>> I don't think we would normally have that space?
> 
> Sure will drop that.
> 
>>
>>> +	vmemmap_end = vmemmap_start + vmemmap_size;
>>> +	if (split_kernel_leaf_boundary(vmemmap_start) ||
>>> +	    split_kernel_leaf_boundary(vmemmap_end)) {
>>> +		pr_warn("[%lx %lx] splits a leaf entry in vmemmap\n",
>>> +			phys_start, phys_end);
>>> +		return false;
>>> +	}
>>> +	return true;
>>> +}
>>> +
>>>  /*
>>>   * This memory hotplug notifier helps prevent boot memory from being
>>>   * inadvertently removed as it blocks pfn range offlining process in
>>> @@ -2083,6 +2206,9 @@ static int prevent_bootmem_remove_notifier(struct notifier_block *nb,
>>>  	if ((action != MEM_GOING_OFFLINE) && (action != MEM_OFFLINE))
>>>  		return NOTIFY_OK;
>>>  
>>> +	if (!can_unmap_without_split(pfn, arg->nr_pages))
>>> +		return NOTIFY_BAD;
>>> +
>>
>> Personally, I'd keep the bootmem check first and do this check after. That means
>> an existing warning will not change.
> 
> Makes sense, will move it after existing bootmem check. BTW the function
> still named as prevent_bootmem_remove_notifier() although now it's going
> to check leaf boundaries as well. Should the function be renamed as well
> to something more generic e.g prevent_memory_remove_notifier() ?

Works for me.

> 
>>
>> Thanks,
>> Ryan
>>
>>>  	for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>>>  		unsigned long start = PFN_PHYS(pfn);
>>>  		unsigned long end = start + (1UL << PA_SECTION_SHIFT);
>>
> 




More information about the linux-arm-kernel mailing list