[PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages

Tue Nov 29 08:56:44 PST 2022

On Tue, 29 Nov 2022 at 17:36, Ryan Roberts <ryan.roberts at arm.com> wrote:
>
> On 29/11/2022 15:47, Ard Biesheuvel wrote:
> > On Tue, 29 Nov 2022 at 16:31, Ryan Roberts <ryan.roberts at arm.com> wrote:
> >>
> >> Hi Ard,
> >>
> >> As promised, I ran your patch set through my test set up and have noticed a few
> >> issues. Sorry it turned into rather a long email...
> >>
> >
> > No worries, and thanks a lot for going through the trouble.
> >
> >> First, a quick explanation of the test suite: For all valid combinations of the
> >> below parameters, boot the host kernel on the FVP, then boot the guest kernel in
> >> a VM, check that booting succeeds all the way to the guest shell then poweroff
> >> guest followed by host can check shutdown is clean.
> >>
> >> Parameters:
> >>  - hw_pa:               [48, lpa, lpa2]
> >>  - hw_va:               [48, 52]
> >>  - kvm_mode:            [vhe, nvhe, protected]
> >>  - host_page_size:      [4KB, 16KB, 64KB]
> >>  - host_pa:             [48, 52]
> >>  - host_va:             [48, 52]
> >>  - host_load_addr:      [low, high]
> >>  - guest_page_size:     [64KB]
> >>  - guest_pa:            [52]
> >>  - guest_va:            [52]
> >>  - guest_load_addr:     [low, high]
> >>
> >> When *_load_addr is 'low', that means the RAM is below 48 bits in (I)PA space.
> >> 'high' means the RAM starts at 2048TB for the guest (52 bit PA), and it means
> >> there are 2 regions for the host; one at 0x880000000000 (48 bit PA) sized to
> >> hold the kernel image only and another at 0x8800000000000 (52 bit PA) sized at
> >> 4GB. The FVP only allows RAM at certain locations and having a contiguous region
> >> cross the 48 bit boundary is not an option. So I chose these values to ensure
> >> that the linear map size is within 51 bits, which is a requirement for
> >> nhve/protected mode kvm.
> >>
> >> In all cases, I preload TF-A bl31, kernel, dt and initrd into RAM and run the
> >> FVP. This sidesteps problems with EFI needing low memory, and with the FVP's
> >> block devices needing DMA memory below 44 bits PA. bl31 and dt are appropriately
> >> fixed up for the 2 different memory layouts.
> >>
> >> Given this was designed to test my KVM changes, I was previously running these
> >> without the host_load_addr=high option for the 4k and 16k host kernels (since
> >> this requires your patch set). In this situation there are 132 valid configs and
> >> all of them pass.
> >>
> >> I then rebased my changes on top of yours and added in the host_load_addr=high
> >> option. Now there are 186 valid configs, 64 of which fail. (some of these
> >> failures are regressions). From a quick initial triage, there are 3 failure modes:
> >>
> >>
> >> 1) 18 FAILING TESTS: Host kernel never outputs anything to console
> >>
> >>   TF-A runs successfully, says it is jumping to the kernel, then nothing further
> >>   is seen. I'm pretty confident that the blobs are loaded into memory correctly
> >>   because the same framework is working for the other configs (including 64k
> >>   kernel loaded into high memory). This affects all configs where a host kernel
> >>   with 4k or 16k pages built with LPA2 support is loaded into high memory.
> >>
> >
> > Not sure how to interpret this in combination with your explanation
> > above, but if 'loaded high' means that the kernel itself is not in
> > 48-bit addressable physical memory, this failure is expected.
>
> Sorry - my wording was confusing. host_load_addr=high means what I said at the
> top; the kernel image is loaded at 0x880000000000 in a block of memory sized to
> hold the kernel image only (actually its forward aligned to 2MB). The dtb and
> initrd are loaded into a 4GB region at 0x8800000000000. The reason I'm doing
> this is to ensure that when I create a VM, the memory used for it (at least the
> vast majority) is coming from the region at 52 bits. I want to do this to prove
> that the stage2 implementation is correctly handling the 52 OA case.
>
> >
> > Given that we have no way of informing the bootloader or firmware
> > whether or not a certain kernel image supports running from such a
> > high offset, it must currently assume it cannot. We've just queued up
> > a documentation fix to clarify this in the boot protocol, i.e., that
> > the kernel must be loaded in 48-bit addressable physical memory.
>
> OK, but I think what I'm doing complies with this. Unless the DTB also has to be
> below 48 bits?
>

Ahh yes, good point. Yes, this is actually implied but we should
clarify this. Or fix it.

But the same reasoning applies: currently, a loader cannot know from
looking at a certain binary whether or not it supports addressing any
memory outside of the 48-bit addressable range, so any asset loading
into physical memory is affected by the same limitation.

This has implications for other firmware aspects as well, i.e., ACPI tables etc.

> >
> > The fact that you had to doctor your boot environment to get around
> > this kind of proves my point, and unless someone is silly enough to
> > ship a SoC that cannot function without this, I don't think we should
> > add this support.
> >
> > I understand how this is an interesting case for completeness from a
> > validation pov, but the reality is that adding support for this would
> > mean introducing changes amounting to dead code to fragile boot
> > sequence code that is already hard to maintain.
>
> I'm not disagreeing. But I think what I'm doing should conform with the
> requirements? (Previously I had the tests set up to just have a single region of
> memory above 52 bits and the kernel image was placed there. That works/worked
> for the 64KB kernel. But I brought the kernel image to below 48 bits to align
> with the requirements of this patch set.
>
> If you see an easier way for me to validate 52 bit OAs in the stage 2 (and
> ideally hyp stage 1), then I'm all ears!
>

There is a Kconfig knob CONFIG_ARM64_FORCE_52BIT which was introduced
for a similar purpose, i.e., to ensure that the 52-bit range gets
utilized. I could imagine adding a similar control for KVM in
particular or for preferring allocations from outside the 48-bit PA
range in general.

> >
> >>
> >> 2) 4 FAILING TESTS: Host kernel gets stuck initializing KVM
> >>
> >>   During kernel boot, last console log is "kvm [1]: vgic interrupt IRQ9". All
> >>   failing tests are configured for protected KVM, and are build with LPA2
> >>   support, running on non-LPA2 HW.
> >>
> >
> > I will try to reproduce this locally.
> >
> >>
> >> 3) 42 FAILING TESTS: Guest kernel never outputs anything to console
> >>
> >>   Host kernel boots fine, and we attempt to launch a guest kernel using kvmtool.
> >>   There is no error reported, but the guest never outputs anything. Haven't
> >>   worked out which config options are common to all failures yet.
> >>
> >
> > This goes a bit beyond what I am currently set up for in terms of
> > testing, but I'm happy to help narrow this down.
> >
> >>
> >> Finally, I removed my code, and ran with your patch set as provided. For this I
> >> hacked up my test suite to boot the host, and ignore booting a guest. I also
> >> didn't bother to vary the KVM mode and just left it in VHE mode. There were 46
> >> valid configs here, of which 4 failed. They were all the same failure mode as
> >> (1) above. Failing configs were:
> >>
> >> id  hw_pa  hw_va  host_page_size  host_pa  host_va  host_load_addr
> >> ------------------------------------------------------------------
> >> 40  lpa    52     4k              52       52       high
> >> 45  lpa    52     16k             52       52       high
> >> 55  lpa2   52     4k              52       52       high
> >> 60  lpa2   52     16k             52       52       high
> >>
> >
> > Same point as above then, I guess.
> >
> >>
> >> So on the balance of probabilities, I think failure mode (1) is very likely to
> >> be due to a bug in your code. (2) and (3) could be my issue or your issue: I
> >> propose to dig into those a bit further and will get back to you on them. I
> >> don't plan to look any further into (1).
> >>
> >
> > Thanks again. (1) is expected, and (2) is something I will investigate further.
>