[PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
Yan Zhao
yan.y.zhao at intel.com
Fri Dec 13 01:49:30 PST 2024
Hi Eric,
This is a repost of the patch "kexec_core: Accept unaccepted kexec
destination addresses" [1], rebased to v6.13-rc2.
The code implementation remains unchanged, but the patch message now
includes more background and explanations to address previous concerns from
you and Baoquan.
Additionally, below is a more detailed explanation of unaccepted memory in
TDX. Please let me know if it is still not clear enough.
== UnAccepted memory in TDX ==
Intel TDX (Trusted Domain Extension) provides a hardware-based trusted
execution environment for TDs (hardware-isolated VMs). The host OS is not
trusted. Although it allocates physical pages for TDs, it does not and
cannot know the content of TD's pages.
TD's memory is added via two methods by invoking different instructions in
the host:
1. For TD's initial private memory, such as for firmware HOBs:
- This type of memory is added without requiring the TD's acceptance.
- The TD will perform attestation of the page GPA and content later.
2. For TD's runtime private memory:
- After the host adds memory, it is pending for the TD's acceptance.
Memory added by method 1 is not relevant to the unaccepted memory we will
discuss.
For memory added by method 2, the TD's acceptance can occur before or after
the TD's memory access:
(a) Access first:
- TD accesses a private GPA,
- Host OS allocates physical memory,
- Host OS requests hardware to map the physical page to the GPA,
- TD accepts the GPA.
(b) Accept first:
- TD accepts a private GPA,
- Host OS allocates physical memory,
- Host OS requests hardware to map the physical page to the GPA,
- TD accesses the GPA.
For "(a) Access first", it is regarded as unsafe for a Linux guest and is
therefore not chosen.
For "(b) Accept first", the TD's "accept" operation includes the following
steps:
- Trigger a VM-exit
- The host OS allocates a physical page and requests hardware to map the
physical page to the GPA.
- Initialize the physical page with content set to 0.
- Encrypt the memory
To enable the "Accept first" approach, an "unaccepted memory" mechanism is
used, which requires cooperation from the virtual firmware and the Linux
guest.
1. The host OS adds initial private memory that does not require TD's
acceptance. The host OS composes EFI_HOB_RESOURCE_DESCRIPTORs and loads
the virtual firmware first. Guest RAM, excluding that for initial
memory, is reported as UNACCEPTED in the descriptor.
2. The virtual firmware parses the descriptors and accepts the UNACCEPTED
memory below 4G. It then excludes the below-4G range from the UNACCEPTED
range.
3. The virtual firmware loads the Linux guest image (the address to load is
below 4G).
4. The Linux guest requests the UNACCEPTED bitmap from the virtual
firmware:
- Locate EFI_UNACCEPTED_MEMORY entries from the memory map returned by
the efi_get_memory_map boot service.
- Request via EFI boot service to allocate an unaccepted_table in memory
of type EFI_ACPI_RECLAIM_MEMORY (E820_TYPE_ACPI) to hold the
unaccepted bitmap.
- Install the unaccepted_table as an EFI configuration table via the
boot service.
- Initialize the unaccepted bitmap according to the
EFI_UNACCEPTED_MEMORY entries.
5. The Linux guest decompresses the kernel image. It accepts the target GPA
for decompression first in case it is not accepted by the virtual
firmware.
6. The Linux guest calls memblock_free_all() to put all memory into the
freelists for the buddy allocator. memblock_free_all() further calls
down to __free_pages_core() to handle memory in 4M (order 10) units.
- In eager mode, the Linux guest accepts all memory and appends it to the
freelists.
- In lazy mode, the Linux guest checks if the entire 4M memory has been
accepted by querying the unaccepted bitmap.
a) If all memory is accepted, it adds the 4M memory to the freelists.
b) If any memory is unaccepted (even if the range contains accepted
pages), the Linux guest does not add the 4M memory to the freelists.
Instead, it queues the first page in the 4M range onto the list
zone->unaccepted_pages and sets the first page with the Unaccepted
flag.
7. When there is not enough free memory, cond_accept_memory() in the Linux
guest calls try_to_accept_memory_one() to dequeue a page from the list
zone->unaccepted_pages, clear its Unaccepted flag, accept the entire 4M
memory range represented by the page, and add the 4M memory to the
freelists.
== Conclusion ==
- The zone->unaccepted_pages is a mechanism to conditionally make accepted
private memory available to the page allocators.
- The unaccepted bitmap resides in the firmware's reserved memory and
persists across guest OSs. It records exactly which pages have not been
accepted.
- Memory ranges represented by zone->unaccepted_pages may contain accepted
pages.
For kexec in TDs,
- If the segments' destination addresses are within the range managed by
the buddy allocator, the pages must have been in an accepted state.
Calling accept_memory() will check the unaccepted bitmap and do nothing.
- If the segments' destination addresses are not yet managed by the buddy
allocator, the pages may or may not have been accepted.
Calling accept_memory() will perform the "accept" operation if they are
not accepted.
For the kexec's second guest kernel, it obtains the unaccepted bitmap by
locating the unaccepted_table in the EFI configuration tables. So, pages
unset in the unaccepted bitmap are not accepted repeatedly.
The unaccepted table/bitmap is only useful for TDs. For a Linux host, it
will detect that the physical firmware does not support the memory
acceptance protocol, and accept_memory() will simply bail out.
Thanks
Yan
[1] https://lore.kernel.org/all/20241021034553.18824-1-yan.y.zhao@intel.com
Yan Zhao (1):
kexec_core: Accept unaccepted kexec segments' destination addresses
kernel/kexec_core.c | 10 ++++++++++
1 file changed, 10 insertions(+)
--
2.43.2
More information about the kexec
mailing list