[PATCH V3 2/2] arm64/mm: Reject memory removal that splits a kernel leaf mapping

David Hildenbrand (Arm) david at kernel.org
Mon Mar 2 07:51:30 PST 2026


On 2/24/26 07:24, Anshuman Khandual wrote:
> Linear and vmemmap mappings that get teared down during a memory hot remove

s/teared/torn/ ?

> operation might contain leaf level entries on any page table level. If the
> requested memory range's linear or vmemmap mappings falls within such leaf
> entries, new mappings need to be created for the remaining memory mapped on
> the leaf entry earlier, following standard break before make aka BBM rules.
> But kernel cannot tolerate BBM and hence remapping to fine grained leaves
> would not be possible on systems without BBML2_NOABORT.
> 
> Currently memory hot remove operation does not perform such restructuring,
> and so removing memory ranges that could split a kernel leaf level mapping
> need to be rejected.
> 
> While memory_hotplug.c does appear to permit hot removing arbitrary ranges
> of memory, the higher layers that drive memory_hotplug (e.g. ACPI, virtio,
> ...) all appear to treat memory as fixed size devices. So it is impossible
> to hot unplug a different amount than was previously hot plugged, and hence
> we should never see a rejection in practice, but adding the check makes us
> robust against a future change.

Is this then really a fix that warrens?

  Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
  Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")

... if nothing is broken?

I can understand the desire for this check, but I am confused about Fixes:


> 
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Will Deacon <will at kernel.org>
> Cc: linux-arm-kernel at lists.infradead.org
> Cc: linux-kernel at vger.kernel.org
> Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
> Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
> Reviewed-by: Ryan Roberts <ryan.roberts at arm.com>
> Suggested-by: Ryan Roberts <ryan.roberts at arm.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual at arm.com>
> ---
>  arch/arm64/mm/mmu.c | 155 ++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 149 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index dfb61d218579..ce0b966f42d1 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -2063,6 +2063,142 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
>  	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
>  }
>  
> +
> +static bool addr_splits_kernel_leaf(unsigned long addr)
> +{
> +	pgd_t *pgdp, pgd;
> +	p4d_t *p4dp, p4d;
> +	pud_t *pudp, pud;
> +	pmd_t *pmdp, pmd;
> +	pte_t *ptep, pte;
> +
> +	/*
> +	 * PGD level:

I consider all these "XXX level:" comments unhelpful. It's mostly there, in
the code already: PGDIR_SIZE, PGDIR_SIZE, PUD_SIZE ... :)

> +	 *
> +	 * If addr is PGD_SIZE aligned - already on a leaf boundary

Similarly, repeating that is not particularly helpful.

You could have a single comment at the very top that says

"If the given address points at a the start address of a possible leaf, we
certainly won't split. Otherwise, check if we would actually split a leaf by
traversing the page tables further."

> +	 */
> +	if (ALIGN_DOWN(addr, PGDIR_SIZE) == addr)

if (IS_ALIGNED(addr, PGDIR_SIZE))
	return false;

?

> +		return false;
> +
> +	pgdp = pgd_offset_k(addr);
> +	pgd = pgdp_get(pgdp);
> +	if (!pgd_present(pgd))
> +		return false;

How could we end up with non-present areas in a range we hotplugged earlier?

> +
> +	/*
> +	 * P4D level:
> +	 *
> +	 * If addr is P4D_SIZE aligned - already on a leaf boundary
> +	 */
> +	if (ALIGN_DOWN(addr, P4D_SIZE) == addr)
> +		return false;
> +
> +	p4dp = p4d_offset(pgdp, addr);
> +	p4d = p4dp_get(p4dp);
> +	if (!p4d_present(p4d))
> +		return false;
> +
> +	/*
> +	 * PUD level:
> +	 *
> +	 * If addr is PUD_SIZE aligned - already on a leaf boundary
> +	 */
> +	if (ALIGN_DOWN(addr, PUD_SIZE) == addr)
> +		return false;
> +
> +	pudp = pud_offset(p4dp, addr);
> +	pud = pudp_get(pudp);
> +	if (!pud_present(pud))
> +		return false;
> +
> +	if (pud_leaf(pud))
> +		return true;
> +
> +	/*
> +	 * CONT_PMD level:
> +	 *
> +	 * If addr is CONT_PMD_SIZE aligned - already on a leaf boundary
> +	 */
> +	if (ALIGN_DOWN(addr, CONT_PMD_SIZE) == addr)
> +		return false;
> +
> +	pmdp = pmd_offset(pudp, addr);
> +	pmd = pmdp_get(pmdp);
> +	if (!pmd_present(pmd))
> +		return false;
> +
> +	if (pmd_cont(pmd))
> +		return true;
> +
> +	/*
> +	 * PMD level:
> +	 *
> +	 * If addr is PMD_SIZE aligned - already on a leaf boundary
> +	 */
> +	if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
> +		return false;
> +
> +	if (pmd_leaf(pmd))
> +		return true;
> +
> +	/*
> +	 * CONT_PTE level:
> +	 *
> +	 * If addr is CONT_PTE_SIZE aligned - already on a leaf boundary
> +	 */
> +	if (ALIGN_DOWN(addr, CONT_PTE_SIZE) == addr)
> +		return false;
> +
> +	ptep = pte_offset_kernel(pmdp, addr);
> +	pte = __ptep_get(ptep);
> +	if (!pte_present(pte))
> +		return false;
> +
> +	if (pte_cont(pte))
> +		return true;
> +
> +	/*
> +	 * PTE level:
> +	 *
> +	 * If addr is PAGE_SIZE aligned - already on a leaf boundary
> +	 */
> +	if (ALIGN_DOWN(addr, PAGE_SIZE) == addr)
> +		return false;
> +	return true;

	return !IS_ALIGNED(addr, PAGE_SIZE);

> +}
> +
> +static bool can_unmap_without_split(unsigned long pfn, unsigned long nr_pages)
> +{
> +	unsigned long phys_start, phys_end, size, start, end;
> +
> +	phys_start = PFN_PHYS(pfn);
> +	phys_end = phys_start + nr_pages * PAGE_SIZE;
> +
> +	/*
> +	 * PFN range's linear map edges are leaf entry aligned
> +	 */
> +	start = __phys_to_virt(phys_start);
> +	end =  __phys_to_virt(phys_end);
> +	if (addr_splits_kernel_leaf(start) || addr_splits_kernel_leaf(end)) {
> +		pr_warn("[%lx %lx] splits a leaf entry in linear map\n",
> +			phys_start, phys_end);
> +		return false;
> +	}
> +
> +	/*
> +	 * PFN range's vmemmap edges are leaf entry aligned
> +	 */
> +	size = nr_pages * sizeof(struct page);
> +	start = (unsigned long)pfn_to_page(pfn);
> +	end = start + size;

As arm64 cannot be used with CONFIG_SPARSEMEM (only with
CONFIG_SPARSEMEM_VMEMMAP, as it sets SPARSEMEM_VMEMMAP_ENABLE),
I think you can just do

	start = (unsigned long)pfn_to_page(pfn);
	end = (unsigned long)pfn_to_page(pfn + nr_pages);

As pfn_to_page() is just simple calculation that works even if "pfn+nr_pages" does not exist.

> +	if (addr_splits_kernel_leaf(start) || addr_splits_kernel_leaf(end)) {
> +		pr_warn("[%lx %lx] splits a leaf entry in vmemmap\n",
> +			phys_start, phys_end);
> +		return false;
> +	}
> +	return true;
> +}
> +

-- 
Cheers,

David



More information about the linux-arm-kernel mailing list