[PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE
Marc Zyngier
maz at kernel.org
Tue Mar 30 13:44:25 BST 2021
On Tue, 30 Mar 2021 12:21:26 +0100,
Ard Biesheuvel <ardb at kernel.org> wrote:
>
> Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA
> configurations") introduced a new layout for the 52-bit VA space, in
> order to maximize the space available to the linear region. After this
> change, the kernel VA space is no longer split 1:1 down the middle, and
> as it turns out, this violates an assumption in the KVM init code when
> it chooses the layout for the nVHE EL2 mapping.
>
> Given that EFI does not support 52-bit VA addressing (as it only
> supports 4k pages), and that in general, loaders cannot assume that the
> kernel being loaded supports 52-bit VA/PA addressing in the first place,
> we can safely assume that the kernel, and therefore the .idmap section,
> will be 48-bit addressable on 52-bit VA capable systems.
>
> So in this case, organize the nVHE EL2 address space as a 2^48 byte
> window starting at address 0x0, containing the ID map and the
> hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte
> linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in
> size, so it is slightly larger, but this only matters on systems where
> the DRAM footprint in the physical memory map exceeds 3968 TB)
So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not
necessarily because I have that much memory, but because my system has
multiple memory banks, one of which lands on that spot, I cannot map
such memory at EL2. We'll explode at run time.
Can we keep the private mapping to 47 bits and restore the missing
chunk to the linear mapping? Of course, it means that the linear map
is now potential no linear anymore, so we'd have to garantee that the
kernel lines in the first 2^47 bits instead. Crap.
>
> Fixes: f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA configurations")
> Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> ---
> Documentation/arm64/booting.rst | 6 +++---
> arch/arm64/kvm/va_layout.c | 18 ++++++++++++++----
> 2 files changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
> index 7552dbc1cc54..418ec9b63d2c 100644
> --- a/Documentation/arm64/booting.rst
> +++ b/Documentation/arm64/booting.rst
> @@ -121,8 +121,8 @@ Header notes:
> to the base of DRAM, since memory below it is not
> accessible via the linear mapping
> 1
> - 2MB aligned base may be anywhere in physical
> - memory
> + 2MB aligned base may be anywhere in the 48-bit
> + addressable physical memory region
> Bits 4-63 Reserved.
> ============= ===============================================================
>
> @@ -132,7 +132,7 @@ Header notes:
> depending on selected features, and is effectively unbound.
>
> The Image must be placed text_offset bytes from a 2MB aligned base
> -address anywhere in usable system RAM and called there. The region
> +address in 48-bit addressable system RAM and called there. The region
> between the 2 MB aligned base address and the start of the image has no
> special significance to the kernel, and may be used for other purposes.
> At least image_size bytes from the start of the image must be free for
> diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
> index 978301392d67..e9ab449de197 100644
> --- a/arch/arm64/kvm/va_layout.c
> +++ b/arch/arm64/kvm/va_layout.c
> @@ -62,9 +62,19 @@ __init void kvm_compute_layout(void)
> phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
> u64 hyp_va_msb;
>
> - /* Where is my RAM region? */
> - hyp_va_msb = idmap_addr & BIT(vabits_actual - 1);
> - hyp_va_msb ^= BIT(vabits_actual - 1);
> + /*
> + * On LVA capable hardware, the kernel is guaranteed to reside
> + * in the 48-bit addressable part of physical memory, and so
> + * the idmap will be located there as well. Put the EL2 linear
> + * region right after it, where it can grow upward to fill the
> + * entire 52-bit VA region.
> + */
> + if (vabits_actual > VA_BITS_MIN) {
> + hyp_va_msb = BIT(VA_BITS_MIN);
> + } else {
> + hyp_va_msb = idmap_addr & BIT(vabits_actual - 1);
> + hyp_va_msb ^= BIT(vabits_actual - 1);
> + }
>
> tag_lsb = fls64((u64)phys_to_virt(memblock_start_of_DRAM()) ^
> (u64)(high_memory - 1));
> @@ -72,7 +82,7 @@ __init void kvm_compute_layout(void)
> va_mask = GENMASK_ULL(tag_lsb - 1, 0);
> tag_val = hyp_va_msb;
>
> - if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb != (vabits_actual - 1)) {
> + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb < (vabits_actual - 1)) {
> /* We have some free bits to insert a random tag. */
> tag_val |= get_random_long() & GENMASK_ULL(vabits_actual - 2, tag_lsb);
> }
It seems __create_hyp_private mapping() still refers to (VA_BITS - 1)
to choose where to allocate the IO mappings, and
__pkvm_create_private_mapping() relies on similar things based on what
hyp_create_idmap().
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list