[PATCH 2/2] arm64/mm: Reject memory removal that splits a kernel leaf mapping

Anshuman Khandual anshuman.khandual at arm.com
Mon Feb 2 03:06:31 PST 2026


On 02/02/26 3:12 PM, Ryan Roberts wrote:
> On 02/02/2026 04:26, Anshuman Khandual wrote:
>> Linear and vmemmap mapings that get teared down during a memory hot remove
>> operation might contain leaf level entries on any page table level. If the
>> requested memory range's linear or vmemmap mappings falls within such leaf
>> entries, new mappings need to be created for the remaning memory mapped on
>> the leaf entry earlier, following standard break before make aka BBM rules.
> 
> I think it would be good to mention that the kernel cannot tolerate BBM so
> remapping to fine grained leaves would not be possible on systems without
> BBML2_NOABORT.

Sure will add that.

> 
>>
>> Currently memory hot remove operation does not perform such restructuring,
>> and so removing memory ranges that could split a kernel leaf level mapping
>> need to be rejected.
> 
> Perhaps it is useful to mention that while memory_hotplug.c does appear to
> permit hot-unplugging arbitrary ranges of memory, the higher layers that drive
> memory_hotplug (e.g. ACPI, virtio, ...) all appear to treat memory as fixed size
> devices so it is impossible to hotunplug a different amount than was previously
> hotplugged, and so we should never see a rejection in practice, but adding the
> check makes us robust against a future change.

Agreed, will update the commit message.

> 
>>
>> Cc: Catalin Marinas <catalin.marinas at arm.com>
>> Cc: Will Deacon <will at kernel.org>
>> Cc: linux-arm-kernel at lists.infradead.org
>> Cc: linux-kernel at vger.kernel.org
>> Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
>> Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
>> Cc: stable at vger.kernel.org
>> Suggested-by: Ryan Roberts <ryan.roberts at arm.com>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual at arm.com>
>> ---
>>  arch/arm64/mm/mmu.c | 126 ++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 126 insertions(+)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 8ec8a287aaa1..9d59e10fb3de 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -2063,6 +2063,129 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
>>  	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
>>  }
>>  
>> +
>> +static bool split_kernel_leaf_boundary(unsigned long addr)
> 
> The name currently makes it sound like we are asking for the mapping to be split
> (we have existing functions to do this that are named similarly). Perhaps a
> better name would be addr_splits_leaf()?

Agreed that name sounds bit confusing and ambiguous as there is already
a similarly named function. Will rename it as addr_splits_kernel_leaf()
instead.

> 
>> +{
>> +	pgd_t *pgdp, pgd;
>> +	p4d_t *p4dp, p4d;
>> +	pud_t *pudp, pud;
>> +	pmd_t *pmdp, pmd;
>> +	pte_t *ptep, pte;
>> +
>> +	/*
>> +	 * PGD: If addr is PGD aligned then addr already
>> +	 * describes a leaf boundary.
>> +	 */
>> +	if (ALIGN_DOWN(addr, PGDIR_SIZE) == addr)
>> +		return false;
>> +
>> +	pgdp = pgd_offset_k(addr);
>> +	pgd = pgdp_get(pgdp);
>> +	if (!pgd_present(pgd))
>> +		return false;
>> +
>> +	/*
>> +	 * P4D: If addr is P4D aligned then addr already
>> +	 * describes a leaf boundary.
>> +	 */
>> +	if (ALIGN_DOWN(addr, P4D_SIZE) == addr)
>> +		return false;
>> +
>> +	p4dp = p4d_offset(pgdp, addr);
>> +	p4d = p4dp_get(p4dp);
>> +	if (!p4d_present(p4d))
>> +		return false;
>> +
>> +	/*
>> +	 * PUD: If addr is PUD aligned then addr already
>> +	 * describes a leaf boundary.
>> +	 */
>> +	if (ALIGN_DOWN(addr, PUD_SIZE) == addr)
>> +		return false;
>> +
>> +	pudp = pud_offset(p4dp, addr);
>> +	pud = pudp_get(pudp);
>> +	if (!pud_present(pud))
>> +		return false;
>> +
>> +	if (pud_leaf(pud))
>> +		return true;
>> +
>> +	/*
>> +	 * CONT_PMD: If addr is CONT_PMD aligned then
>> +	 * addr already describes a leaf boundary.
>> +	 */
>> +	if (ALIGN_DOWN(addr, CONT_PMD_SIZE) == addr)
>> +		return false;
>> +
>> +	pmdp = pmd_offset(pudp, addr);
>> +	pmd = pmdp_get(pmdp);
>> +	if (!pmd_present(pmd))
>> +		return false;
>> +
>> +	if (pmd_leaf(pmd) && pmd_cont(pmd))
>> +		return true;
>> +
>> +	/*
>> +	 * PMD: If addr is PMD aligned then addr already
>> +	 * describes a leaf boundary.
>> +	 */
>> +	if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
>> +		return false;
>> +
>> +	if (pmd_leaf(pmd))
>> +		return true;
>> +
>> +	/*
>> +	 * CONT_PTE: If addr is CONT_PTE aligned then addr
>> +	 * already describes a leaf boundary.
>> +	 */
>> +	if (ALIGN_DOWN(addr, CONT_PTE_SIZE) == addr)
>> +		return false;
>> +
>> +	ptep = pte_offset_kernel(pmdp, addr);
>> +	pte = __ptep_get(ptep);
>> +	if (!pte_present(pte))
>> +		return false;
>> +
>> +	if (pte_valid(pte) && pte_cont(pte))
> 
> Why do you need pte_valid() here? You have already checked !pte_present(). Are
> you expecting a case of present but not valid (PTE_PRESENT_INVALID)? If so, do
> you need to consider that for the other levels too? (pmd_leaf() only checks
> pmd_present()).
> > Personally I think you can just drop the pte_valid() check here.

Added pte_valid() for abundance of caution but it is not really
necessary though. Sure will drop it off.

> 
>> +		return true;
>> +
>> +	if (ALIGN_DOWN(addr, PAGE_SIZE) == addr)
>> +		return false;
>> +	return true;
>> +}
>> +
>> +static bool can_unmap_without_split(unsigned long pfn, unsigned long nr_pages)
>> +{
>> +	unsigned long linear_start, linear_end, phys_start, phys_end;
>> +	unsigned long vmemmap_size, vmemmap_start, vmemmap_end;
> 
> nit: do we need all these variables. Perhaps just:
> 
> unsigned long sz, start, end, phys_start, phys_end;
> 
> are sufficient?

Alright. I guess start and end can be re-used both for linear and
vmemmap mapping.

> 
>> +
>> +	/* Assert linear map edges do not split a leaf entry */
>> +	phys_start = PFN_PHYS(pfn);
>> +	phys_end = phys_start + nr_pages * PAGE_SIZE;
>> +	linear_start = __phys_to_virt(phys_start);
>> +	linear_end =  __phys_to_virt(phys_end);
>> +	if (split_kernel_leaf_boundary(linear_start) ||
>> +	    split_kernel_leaf_boundary(linear_end)) {
>> +		pr_warn("[%lx %lx] splits a leaf entry in linear map\n",
>> +			phys_start, phys_end);
>> +		return false;
>> +	}
>> +
>> +	/* Assert vmemmap edges do not split a leaf entry */
>> +	vmemmap_size = nr_pages * sizeof(struct page);
>> +	vmemmap_start = (unsigned long) pfn_to_page(pfn);
> 
> nit:                                   ^
> 
> I don't think we would normally have that space?

Sure will drop that.

> 
>> +	vmemmap_end = vmemmap_start + vmemmap_size;
>> +	if (split_kernel_leaf_boundary(vmemmap_start) ||
>> +	    split_kernel_leaf_boundary(vmemmap_end)) {
>> +		pr_warn("[%lx %lx] splits a leaf entry in vmemmap\n",
>> +			phys_start, phys_end);
>> +		return false;
>> +	}
>> +	return true;
>> +}
>> +
>>  /*
>>   * This memory hotplug notifier helps prevent boot memory from being
>>   * inadvertently removed as it blocks pfn range offlining process in
>> @@ -2083,6 +2206,9 @@ static int prevent_bootmem_remove_notifier(struct notifier_block *nb,
>>  	if ((action != MEM_GOING_OFFLINE) && (action != MEM_OFFLINE))
>>  		return NOTIFY_OK;
>>  
>> +	if (!can_unmap_without_split(pfn, arg->nr_pages))
>> +		return NOTIFY_BAD;
>> +
> 
> Personally, I'd keep the bootmem check first and do this check after. That means
> an existing warning will not change.

Makes sense, will move it after existing bootmem check. BTW the function
still named as prevent_bootmem_remove_notifier() although now it's going
to check leaf boundaries as well. Should the function be renamed as well
to something more generic e.g prevent_memory_remove_notifier() ?

> 
> Thanks,
> Ryan
> 
>>  	for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>>  		unsigned long start = PFN_PHYS(pfn);
>>  		unsigned long end = start + (1UL << PA_SECTION_SHIFT);
> 




More information about the linux-arm-kernel mailing list