[PATCH v2] arm64/efi: don't pad between EFI_MEMORY_RUNTIME regions

Mark Rutland mark.rutland at arm.com
Thu Sep 10 06:22:12 PDT 2015


Hi,

FWIW I gave this a spin on Seattle and Juno and saw no regressions (both
are pre-2.5 EFI though).

I have some concerns below.

On Wed, Sep 09, 2015 at 08:06:54AM +0100, Ard Biesheuvel wrote:
> The new Properties Table feature introduced in UEFIv2.5 may split
> memory regions that cover PE/COFF memory images into separate code
> and data regions. Since these regions only differ in the type (runtime
> code vs runtime data) and the permission bits, but not in the memory
> type attributes (UC/WC/WT/WB), the spec does not require them to be
> aligned to 64 KB.

We should require those to be 64k-aligned for permissions too. I can
imagine vendors getting permissions wrong but things happening to work
for a 64k kernel (where I assume we have to use the superset of all
permissions within a 64k page).

> As the relative offset of PE/COFF .text and .data segments cannot be
> changed on the fly, this means that we can no longer pad out those
> regions to be mappable using 64 KB pages.
> Unfortunately, there is no annotation in the UEFI memory map that
> identifies data regions that were split off from a code region, so we
> must apply this logic to all adjacent runtime regions whose attributes
> only differ in the permission bits.
> 
> So instead of rounding each memory region to 64 KB alignment at both
> ends, only round down regions that are not directly preceded by another
> runtime region with the same type attributes. Since the UEFI spec does
> not mandate that the memory map be sorted, this means we also need to
> sort it first.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
> ---
> 
> As discussed off list, this is the arm64 side of what we should backport
> to stable to prevent firmware that adheres to the current version of the
> UEFI v2.5 spec with the memprotect feature enabled from blowing up the system
> upon the first OS call into the runtime services.
> 
> For arm64, we already map things in order, but since the spec does not mandate
> a sorted memory map, we need to sort it to be sure. This also allows us to
> easily find adjacent regions with < 64 KB granularity, which the current version
> of the spec allows if they only differ in permission bits (which the spec says
> are 'unused' on AArch64, which could be interpreted as 'allowed but ignored').
> 
> Changes since v1:
> - Ensure that we don't inadvertently set the XN bit on the preceding region at
>   mapping time if we the OS is running with >4 KB pages.
>   
>  arch/arm64/kernel/efi.c                 |  3 +-
>  drivers/firmware/efi/libstub/arm-stub.c | 62 +++++++++++++++-----
>  2 files changed, 49 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
> index e8ca6eaedd02..13671a9cf016 100644
> --- a/arch/arm64/kernel/efi.c
> +++ b/arch/arm64/kernel/efi.c
> @@ -258,7 +258,8 @@ static bool __init efi_virtmap_init(void)
>  		 */
>  		if (!is_normal_ram(md))
>  			prot = __pgprot(PROT_DEVICE_nGnRE);
> -		else if (md->type == EFI_RUNTIME_SERVICES_CODE)
> +		else if (md->type == EFI_RUNTIME_SERVICES_CODE ||
> +			 !PAGE_ALIGNED(md->phys_addr))
>  			prot = PAGE_KERNEL_EXEC;

This looks coarser than necessary. For memory organised like:

0x00000000 - 0x0000F000 (60KiB) : EFI_RUNTIME_SERVICES_CODE 
0x0000F000 - 0x00020000 (68KiB) : EFI_RUNTIME_SERVICES_DATA

We should be able to make the last 64K non-executable, but with this all
128K is executable, unless I've missed something?

Maybe we could do a two-step pass, first mapping the data as
not-executable, then mapping any code pages executable (overriding any
overlapping portions, but only for the overlapping parts).

>  		else
>  			prot = PAGE_KERNEL;
> diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
> index e29560e6b40b..cb4e9c4de952 100644
> --- a/drivers/firmware/efi/libstub/arm-stub.c
> +++ b/drivers/firmware/efi/libstub/arm-stub.c
> @@ -13,6 +13,7 @@
>   */
>  
>  #include <linux/efi.h>
> +#include <linux/sort.h>

Sort isn't an inline in this header. I thought it wasn't safe to call
arbitary kernel functions from the stub?

>  #include <asm/efi.h>
>  
>  #include "efistub.h"
> @@ -305,6 +306,13 @@ fail:
>   */
>  #define EFI_RT_VIRTUAL_BASE	0x40000000
>  
> +static int cmp_mem_desc(const void *a, const void *b)
> +{
> +	const efi_memory_desc_t *left = a, *right = b;
> +
> +	return (left->phys_addr > right->phys_addr) ? 1 : -1;
> +}

Nit: please chose names to make the relationship between these clearer.
e.g. s/left/mem_a/, s/right/mem_b/.

> +
>  /*
>   * efi_get_virtmap() - create a virtual mapping for the EFI memory map
>   *
> @@ -316,34 +324,58 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
>  		     unsigned long desc_size, efi_memory_desc_t *runtime_map,
>  		     int *count)
>  {
> +	static const u64 mem_type_mask = EFI_MEMORY_WB | EFI_MEMORY_WT |
> +					 EFI_MEMORY_WC | EFI_MEMORY_UC |
> +					 EFI_MEMORY_RUNTIME;
> +
>  	u64 efi_virt_base = EFI_RT_VIRTUAL_BASE;
> -	efi_memory_desc_t *out = runtime_map;
> +	efi_memory_desc_t *in, *prev = NULL, *out = runtime_map;
>  	int l;
>  
> -	for (l = 0; l < map_size; l += desc_size) {
> -		efi_memory_desc_t *in = (void *)memory_map + l;
> +	/*
> +	 * To work around potential issues with the Properties Table feature
> +	 * introduced in UEFI 2.5, which may split PE/COFF executable images
> +	 * in memory into several RuntimeServicesCode and RuntimeServicesData
> +	 * regions, we need to preserve the relative offsets between adjacent
> +	 * EFI_MEMORY_RUNTIME regions with the same memory type attributes.
> +	 * The easiest way to find adjacent regions is to sort the memory map
> +	 * before traversing it.
> +	 */
> +	sort(memory_map, map_size / desc_size, desc_size, cmp_mem_desc, NULL);
> +
> +	for (l = 0; l < map_size; l += desc_size, prev = in) {
>  		u64 paddr, size;
>  
> +		in = (void *)memory_map + l;
>  		if (!(in->attribute & EFI_MEMORY_RUNTIME))
>  			continue;
>  
> +		paddr = in->phys_addr;
> +		size = in->num_pages * EFI_PAGE_SIZE;
> +
>  		/*
>  		 * Make the mapping compatible with 64k pages: this allows
>  		 * a 4k page size kernel to kexec a 64k page size kernel and
>  		 * vice versa.
>  		 */
> -		paddr = round_down(in->phys_addr, SZ_64K);
> -		size = round_up(in->num_pages * EFI_PAGE_SIZE +
> -				in->phys_addr - paddr, SZ_64K);
> -
> -		/*
> -		 * Avoid wasting memory on PTEs by choosing a virtual base that
> -		 * is compatible with section mappings if this region has the
> -		 * appropriate size and physical alignment. (Sections are 2 MB
> -		 * on 4k granule kernels)
> -		 */
> -		if (IS_ALIGNED(in->phys_addr, SZ_2M) && size >= SZ_2M)
> -			efi_virt_base = round_up(efi_virt_base, SZ_2M);
> +		if (!prev ||
> +		    ((prev->attribute ^ in->attribute) & mem_type_mask) != 0 ||
> +		    paddr != (prev->phys_addr + prev->num_pages * EFI_PAGE_SIZE)) {
> +

This looks correct, though slightly painful to read. It might be nicer
with helpers helpers like descs_have_same_attrs and
descs_are_contiguous.

Thanks,
Mark.



More information about the linux-arm-kernel mailing list