[PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages

Tue Nov 29 07:31:28 PST 2022

Hi Ard,

As promised, I ran your patch set through my test set up and have noticed a few
issues. Sorry it turned into rather a long email...

First, a quick explanation of the test suite: For all valid combinations of the
below parameters, boot the host kernel on the FVP, then boot the guest kernel in
a VM, check that booting succeeds all the way to the guest shell then poweroff
guest followed by host can check shutdown is clean.

Parameters:
 - hw_pa:		[48, lpa, lpa2]
 - hw_va:		[48, 52]
 - kvm_mode:		[vhe, nvhe, protected]
 - host_page_size:	[4KB, 16KB, 64KB]
 - host_pa:		[48, 52]
 - host_va:		[48, 52]
 - host_load_addr:	[low, high]
 - guest_page_size:	[64KB]
 - guest_pa:		[52]
 - guest_va:		[52]
 - guest_load_addr:	[low, high]

When *_load_addr is 'low', that means the RAM is below 48 bits in (I)PA space.
'high' means the RAM starts at 2048TB for the guest (52 bit PA), and it means
there are 2 regions for the host; one at 0x880000000000 (48 bit PA) sized to
hold the kernel image only and another at 0x8800000000000 (52 bit PA) sized at
4GB. The FVP only allows RAM at certain locations and having a contiguous region
cross the 48 bit boundary is not an option. So I chose these values to ensure
that the linear map size is within 51 bits, which is a requirement for
nhve/protected mode kvm.

In all cases, I preload TF-A bl31, kernel, dt and initrd into RAM and run the
FVP. This sidesteps problems with EFI needing low memory, and with the FVP's
block devices needing DMA memory below 44 bits PA. bl31 and dt are appropriately
fixed up for the 2 different memory layouts.

Given this was designed to test my KVM changes, I was previously running these
without the host_load_addr=high option for the 4k and 16k host kernels (since
this requires your patch set). In this situation there are 132 valid configs and
all of them pass.

I then rebased my changes on top of yours and added in the host_load_addr=high
option. Now there are 186 valid configs, 64 of which fail. (some of these
failures are regressions). From a quick initial triage, there are 3 failure modes:

1) 18 FAILING TESTS: Host kernel never outputs anything to console

  TF-A runs successfully, says it is jumping to the kernel, then nothing further
  is seen. I'm pretty confident that the blobs are loaded into memory correctly
  because the same framework is working for the other configs (including 64k
  kernel loaded into high memory). This affects all configs where a host kernel
  with 4k or 16k pages built with LPA2 support is loaded into high memory.

2) 4 FAILING TESTS: Host kernel gets stuck initializing KVM

  During kernel boot, last console log is "kvm [1]: vgic interrupt IRQ9". All
  failing tests are configured for protected KVM, and are build with LPA2
  support, running on non-LPA2 HW.

3) 42 FAILING TESTS: Guest kernel never outputs anything to console

  Host kernel boots fine, and we attempt to launch a guest kernel using kvmtool.
  There is no error reported, but the guest never outputs anything. Haven't
  worked out which config options are common to all failures yet.

Finally, I removed my code, and ran with your patch set as provided. For this I
hacked up my test suite to boot the host, and ignore booting a guest. I also
didn't bother to vary the KVM mode and just left it in VHE mode. There were 46
valid configs here, of which 4 failed. They were all the same failure mode as
(1) above. Failing configs were:

id  hw_pa  hw_va  host_page_size  host_pa  host_va  host_load_addr
------------------------------------------------------------------
40  lpa    52     4k              52       52       high
45  lpa    52     16k             52       52       high
55  lpa2   52     4k              52       52       high
60  lpa2   52     16k             52       52       high

So on the balance of probabilities, I think failure mode (1) is very likely to
be due to a bug in your code. (2) and (3) could be my issue or your issue: I
propose to dig into those a bit further and will get back to you on them. I
don't plan to look any further into (1).

Thanks,
Ryan

On 24/11/2022 12:39, Ard Biesheuvel wrote:
> Enable support for LPA2 when running with 4k or 16k pages. In the former
> case, this requires 5 level paging with a runtime fallback to 4 on
> non-LPA2 hardware. For consistency, the same approach is adopted for 16k
> pages, where we fall back to 3 level paging (47 bit virtual addressing)
> on non-LPA2 configurations. (Falling back to 48 bits would involve
> finding a workaround for the fact that we cannot construct a level 0
> table covering 52 bits of VA space that appears aligned to its size in
> memory, and has the top 2 entries that represent the 48-bit region
> appearing at an alignment of 64 bytes, which is required by the
> architecture for TTBR address values. Also, using an additional level of
> paging to translate a single VA bit is wasteful in terms of TLB
> efficiency)
> 
> This means support for falling back to 3 levels of paging at runtime
> when configured for 4 is also needed.
> 
> Another thing worth to note is that the repurposed physical address bits
> in the page table descriptors were not RES0 before, and so there is now
> a big global switch (called TCR.DS) which controls how all page table
> descriptors are interpreted. This requires some extra care in the PTE
> conversion helpers, and additional handling in the boot code to ensure
> that we set TCR.DS safely if supported (and not overridden)
> 
> Note that this series is mostly orthogonal to work by Anshuman done last
> year: this series assumes that 52-bit physical addressing is never
> needed to map the kernel image itself, and therefore that we never need
> ID map range extension to cover the kernel with a 5th level when running
> with 4. And given that the LPA2 architectural feature covers both the
> virtual and physical range extensions, where enabling the latter is
> required to enable the former, we can simplify things further by only
> enabling them as a pair. (I.e., 52-bit physical addressing cannot be
> enabled for 48-bit VA space or smaller)
> 
> This series applies onto some of my previous work that is still in
> flight, so these patches will not apply in isolation. Complete branch
> can be found here:
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-4k-lpa2
> 
> It supersedes the RFC v1 I sent out last week, which covered 16k pages
> only. It also supersedes some related work I sent out in isolation
> before:
> 
> [PATCH] arm64: mm: Enable KASAN for 16k/48-bit VA configurations
> [PATCH 0/3] arm64: mm: Model LVA support as a CPU feature
> 
> Tested on QEMU with -cpu max and lpa2 both off and on, as well as using
> the arm64.nolva override kernel command line parameter. Note that this
> requires a QEMU built from the latest sources.
> 
> Cc: Marc Zyngier <maz at kernel.org>
> Cc: Will Deacon <will at kernel.org>
> Cc: Mark Rutland <mark.rutland at arm.com>
> Cc: Kees Cook <keescook at chromium.org>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Mark Brown <broonie at kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual at arm.com>
> Cc: Richard Henderson <richard.henderson at linaro.org>
> Cc: Ryan Roberts <ryan.roberts at arm.com>
> 
> Anshuman Khandual (3):
>   arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses
>   arm64/mm: Add FEAT_LPA2 specific TCR_EL1.DS field
>   arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
> 
> Ard Biesheuvel (16):
>   arm64: kaslr: Adjust randomization range dynamically
>   arm64: mm: get rid of kimage_vaddr global variable
>   arm64: head: remove order argument from early mapping routine
>   arm64: mm: Handle LVA support as a CPU feature
>   arm64: mm: Deal with potential ID map extension if VA_BITS >
>     VA_BITS_MIN
>   arm64: mm: Add feature override support for LVA
>   arm64: mm: Wire up TCR.DS bit to PTE shareability fields
>   arm64: mm: Add LPA2 support to phys<->pte conversion routines
>   arm64: mm: Add definitions to support 5 levels of paging
>   arm64: mm: add 5 level paging support to G-to-nG conversion routine
>   arm64: Enable LPA2 at boot if supported by the system
>   arm64: mm: Add 5 level paging support to fixmap and swapper handling
>   arm64: kasan: Reduce minimum shadow alignment and enable 5 level
>     paging
>   arm64: mm: Add support for folding PUDs at runtime
>   arm64: ptdump: Disregard unaddressable VA space
>   arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs
> 
>  arch/arm64/Kconfig                      |  23 ++-
>  arch/arm64/include/asm/assembler.h      |  42 ++---
>  arch/arm64/include/asm/cpufeature.h     |   2 +
>  arch/arm64/include/asm/fixmap.h         |   1 +
>  arch/arm64/include/asm/kernel-pgtable.h |  27 ++-
>  arch/arm64/include/asm/memory.h         |  23 ++-
>  arch/arm64/include/asm/pgalloc.h        |  53 +++++-
>  arch/arm64/include/asm/pgtable-hwdef.h  |  34 +++-
>  arch/arm64/include/asm/pgtable-prot.h   |  18 +-
>  arch/arm64/include/asm/pgtable-types.h  |   6 +
>  arch/arm64/include/asm/pgtable.h        | 197 ++++++++++++++++++--
>  arch/arm64/include/asm/sysreg.h         |   2 +
>  arch/arm64/include/asm/tlb.h            |   3 +-
>  arch/arm64/kernel/cpufeature.c          |  46 ++++-
>  arch/arm64/kernel/head.S                |  99 +++++-----
>  arch/arm64/kernel/image-vars.h          |   4 +
>  arch/arm64/kernel/pi/idreg-override.c   |  29 ++-
>  arch/arm64/kernel/pi/kaslr_early.c      |  23 ++-
>  arch/arm64/kernel/pi/map_kernel.c       | 115 +++++++++++-
>  arch/arm64/kernel/sleep.S               |   3 -
>  arch/arm64/mm/init.c                    |   2 +-
>  arch/arm64/mm/kasan_init.c              | 124 ++++++++++--
>  arch/arm64/mm/mmap.c                    |   4 +
>  arch/arm64/mm/mmu.c                     | 138 ++++++++++----
>  arch/arm64/mm/pgd.c                     |  17 +-
>  arch/arm64/mm/proc.S                    |  76 +++++++-
>  arch/arm64/mm/ptdump.c                  |   4 +-
>  arch/arm64/tools/cpucaps                |   1 +
>  28 files changed, 907 insertions(+), 209 deletions(-)
>