[PATCH 06/10] arm64/efi: use UEFI memory map unconditionally if available

Mark Rutland mark.rutland at arm.com
Thu Oct 23 12:14:34 PDT 2014


[...]

> > > It also looks like EFI_BOOT flag will be set even if uefi_init fails.
> > > If uefi_init fails, we only need reserve_regions() for the purpose
> > > of adding memblocks. Otherwise, we end up wasting a lot of memory.
> > 
> > What memory are we wasting in that case? Surely the only items that we
> > could choose to throw away if we failed uefi_init are
> > EFI_ACPI_RECLAIM_MEMORY and EFI_MEMORY_RUNTIME?
> > 
> > We might want to keep those around so we can kexec into a kernel where
> > we can make use of them. Surely they shouldn't take up a significant
> > proportion of the available memory?
> 
> In addition, reserve_regions() also reserves BOOT_SERVICES (which gets
> freed up later after set_virtual_address_map(). That won't happen if
> uefi_init() fails. In any case, if uefi_init() fails, we won't be able
> to find the ACPI tables kexec kernel won't be able to either.

If you need the ACPI tables, the first kernel won't boot. However, if we
fail to map the runtime services data and code, that doesn't mean the
next kernel can't. We might have failed to map the runtime services due
to an early_ioremap bug or similar, and that could be fixed in the
kernel we're about to kexec to.

If we're going to nuke runtime regions, we should nuke them in the
memory map descriptors as a courtesy to the next kernel. Otherwise we
could be more courteous and simply leave them be (which is my
preference).

> The BOOT_SERVICES stuff is the big chunk. There was some discussion of
> not reserving that but no one has gotten around to writing the patch
> to clean out the code that does it.

I've just spent some time doing so; patch below. It boots happily on my
Juno (with 4KiB pages at least), but I haven't given it much of a stress
test.

I wasn't sure if unusable memory could show up with EFI_MEMORY_WB
attributes. If so, this patch still won't prevent that from being mapped
as cacheable, but that should be easy to fix.

Thanks,
Mark.

---->8----
>From 581d5bac5b4a6f93e23737012b71c58d809af6bb Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland at arm.com>
Date: Thu, 23 Oct 2014 19:41:55 +0100
Subject: [PATCH] arm64: efi: Simplify memory descriptor handling

Currently discovering the memory map from UEFI is a multi-stage affair,
where the boot services code and data are kept around for much longer
than necessary. Additionally, potential overlap between memory and IO
mappings at kernel page granularity is not handled.

This patch attempts to solve both of these issues, with the addition and
filtering of memory regions being handled in a single function.

First, all normal memory (i.e. anything which can be mapped writeback
cacheable) is added to memblock. Second, IO and reserved regions are
filtered out of memblock at kernel page granularity, to prevent
potential conflicting mappings.

As some reserved regions must be present in the idmap these are
reserved. Everything else that's not normal memory is removed,
preventing the possibility of a cacheable mapping covering something it
shouldn't.

Signed-off-by: Mark Rutland <mark.rutland at arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel at linaro.org
Cc: Catalin Marinas <catalin.marinas at arm.com>
Cc: Leif Lindholm <leif.lindholm at linaro.org
Cc: Mark Salter <msalter at redhat.com>
---
 arch/arm64/kernel/efi.c | 181 +++++++++++-------------------------------------
 1 file changed, 42 insertions(+), 139 deletions(-)

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 03aaa99..8873c28 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -154,161 +154,66 @@ static __init int is_reserve_region(efi_memory_desc_t *md)
 	return 0;
 }
 
-static __init void reserve_regions(void)
+static __init void efi_add_memory(void)
 {
 	efi_memory_desc_t *md;
-	u64 paddr, npages, size;
+	u64 paddr, size;
 
 	if (uefi_debug)
-		pr_info("Processing EFI memory map:\n");
+		pr_info("EFI: adding normal memory:\n");
 
 	for_each_efi_memory_desc(&memmap, md) {
+		if (!is_normal_ram(md))
+			continue;
+
 		paddr = md->phys_addr;
-		npages = md->num_pages;
+		size = md->num_pages << EFI_PAGE_SHIFT;
 
 		if (uefi_debug)
-			pr_info("  0x%012llx-0x%012llx [%s]",
-				paddr, paddr + (npages << EFI_PAGE_SHIFT) - 1,
+			pr_info("  0x%012llx-0x%012llx [%s]\n",
+				paddr, size - 1,
 				memory_type_name[md->type]);
 
-		memrange_efi_to_native(&paddr, &npages);
-		size = npages << PAGE_SHIFT;
+		early_init_dt_add_memory_arch(paddr, size);
+	}
+
+	/*
+	 * Regions describe in an EFI_MEMORY_DESCRIPTOR are only guaranteed to
+	 * be 4KiB aligned but the kernel page size could be larger, and
+	 * regions with different attributes could fall into the same kernel
+	 * page. Thus we must remove any memory in the same kernel page as an
+	 * IO region.
+	 *
+	 * Reserved regions of memory need to be in the idmap, so just reserve
+	 * them.
+	 */
+	if (uefi_debug)
+		pr_info("EFI: correcting for overlapping regions:\n");
+
+	for_each_efi_memory_desc(&memmap, md) {
+		if (!is_reserve_region(md) && is_normal_ram(md))
+			continue;
+
+		paddr = md->phys_addr;
+		size = md->num_pages << EFI_PAGE_SHIFT;
+
+		size = PAGE_ALIGN(size + (paddr & ~PAGE_MASK));
+		paddr &= PAGE_MASK;
 
-		if (is_normal_ram(md))
-			early_init_dt_add_memory_arch(paddr, size);
+		if (uefi_debug)
+			pr_info("  0x%012llx-0x%012llx [%s]\n",
+				paddr, size - 1,
+				memory_type_name[md->type]);
 
-		if (is_reserve_region(md) ||
-		    md->type == EFI_BOOT_SERVICES_CODE ||
-		    md->type == EFI_BOOT_SERVICES_DATA) {
+		if (is_reserve_region(md))
 			memblock_reserve(paddr, size);
-			if (uefi_debug)
-				pr_cont("*");
-		}
-
-		if (uefi_debug)
-			pr_cont("\n");
+		else
+			memblock_remove(paddr, size);
 	}
 
 	set_bit(EFI_MEMMAP, &efi.flags);
 }
 
-
-static u64 __init free_one_region(u64 start, u64 end)
-{
-	u64 size = end - start;
-
-	if (uefi_debug)
-		pr_info("  EFI freeing: 0x%012llx-0x%012llx\n",	start, end - 1);
-
-	free_bootmem_late(start, size);
-	return size;
-}
-
-static u64 __init free_region(u64 start, u64 end)
-{
-	u64 map_start, map_end, total = 0;
-
-	if (end <= start)
-		return total;
-
-	map_start = (u64)memmap.phys_map;
-	map_end = PAGE_ALIGN(map_start + (memmap.map_end - memmap.map));
-	map_start &= PAGE_MASK;
-
-	if (start < map_end && end > map_start) {
-		/* region overlaps UEFI memmap */
-		if (start < map_start)
-			total += free_one_region(start, map_start);
-
-		if (map_end < end)
-			total += free_one_region(map_end, end);
-	} else
-		total += free_one_region(start, end);
-
-	return total;
-}
-
-static void __init free_boot_services(void)
-{
-	u64 total_freed = 0;
-	u64 keep_end, free_start, free_end;
-	efi_memory_desc_t *md;
-
-	/*
-	 * If kernel uses larger pages than UEFI, we have to be careful
-	 * not to inadvertantly free memory we want to keep if there is
-	 * overlap at the kernel page size alignment. We do not want to
-	 * free is_reserve_region() memory nor the UEFI memmap itself.
-	 *
-	 * The memory map is sorted, so we keep track of the end of
-	 * any previous region we want to keep, remember any region
-	 * we want to free and defer freeing it until we encounter
-	 * the next region we want to keep. This way, before freeing
-	 * it, we can clip it as needed to avoid freeing memory we
-	 * want to keep for UEFI.
-	 */
-
-	keep_end = 0;
-	free_start = 0;
-
-	for_each_efi_memory_desc(&memmap, md) {
-		u64 paddr, npages, size;
-
-		if (is_reserve_region(md)) {
-			/*
-			 * We don't want to free any memory from this region.
-			 */
-			if (free_start) {
-				/* adjust free_end then free region */
-				if (free_end > md->phys_addr)
-					free_end -= PAGE_SIZE;
-				total_freed += free_region(free_start, free_end);
-				free_start = 0;
-			}
-			keep_end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT);
-			continue;
-		}
-
-		if (md->type != EFI_BOOT_SERVICES_CODE &&
-		    md->type != EFI_BOOT_SERVICES_DATA) {
-			/* no need to free this region */
-			continue;
-		}
-
-		/*
-		 * We want to free memory from this region.
-		 */
-		paddr = md->phys_addr;
-		npages = md->num_pages;
-		memrange_efi_to_native(&paddr, &npages);
-		size = npages << PAGE_SHIFT;
-
-		if (free_start) {
-			if (paddr <= free_end)
-				free_end = paddr + size;
-			else {
-				total_freed += free_region(free_start, free_end);
-				free_start = paddr;
-				free_end = paddr + size;
-			}
-		} else {
-			free_start = paddr;
-			free_end = paddr + size;
-		}
-		if (free_start < keep_end) {
-			free_start += PAGE_SIZE;
-			if (free_start >= free_end)
-				free_start = 0;
-		}
-	}
-	if (free_start)
-		total_freed += free_region(free_start, free_end);
-
-	if (total_freed)
-		pr_info("Freed 0x%llx bytes of EFI boot services memory",
-			total_freed);
-}
-
 void __init efi_init(void)
 {
 	struct efi_fdt_params params;
@@ -330,7 +235,7 @@ void __init efi_init(void)
 	if (uefi_init() < 0)
 		return;
 
-	reserve_regions();
+	efi_add_memory();
 }
 
 void __init efi_idmap_init(void)
@@ -452,8 +357,6 @@ static int __init arm64_enter_virtual_mode(void)
 
 	kfree(virtmap);
 
-	free_boot_services();
-
 	if (status != EFI_SUCCESS) {
 		pr_err("Failed to set EFI virtual address map! [%lx]\n",
 			status);
-- 
1.9.1




More information about the linux-arm-kernel mailing list