[PATCHv4 10/14] x86/tdx: Convert shared memory back to private on kexec
Edgecombe, Rick P
rick.p.edgecombe at intel.com
Tue Dec 5 17:28:08 PST 2023
On Tue, 2023-12-05 at 03:45 +0300, Kirill A. Shutemov wrote:
> +static void tdx_kexec_unshare_mem(bool crash)
> +{
> + unsigned long addr, end;
> + long found = 0, shared;
> +
> + /* Stop new private<->shared conversions */
> + conversion_allowed = false;
I wonder if this might need a compiler barrier here to be totally safe.
I'm not sure.
> +
> + /*
> + * Crash kernel reaches here with interrupts disabled: can't
> wait for
> + * conversions to finish.
> + *
> + * If race happened, just report and proceed.
> + */
> + if (!crash) {
> + unsigned long timeout;
> +
> + /*
> + * Wait for in-flight conversions to complete.
> + *
> + * Do not wait more than 30 seconds.
> + */
> + timeout = 30 * USEC_PER_SEC;
> + while (atomic_read(&conversions_in_progress) &&
> timeout--)
> + udelay(1);
> + }
> +
> + if (atomic_read(&conversions_in_progress))
> + pr_warn("Failed to finish shared<->private
> conversions\n");
I can't think of any non-ridiculous way to handle this case. Maybe we
need VMM help.
>
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index 830425e6d38e..c81afffaa954 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -12,6 +12,7 @@
> #include <linux/delay.h>
> #include <linux/objtool.h>
> #include <linux/pgtable.h>
> +#include <linux/kexec.h>
> #include <acpi/reboot.h>
> #include <asm/io.h>
> #include <asm/apic.h>
> @@ -31,6 +32,7 @@
> #include <asm/realmode.h>
> #include <asm/x86_init.h>
> #include <asm/efi.h>
> +#include <asm/tdx.h>
>
> /*
> * Power off function, if any
> @@ -716,6 +718,14 @@ static void
> native_machine_emergency_restart(void)
>
> void native_machine_shutdown(void)
> {
> + /*
> + * Call enc_kexec_unshare_mem() while all CPUs are still
> active and
> + * interrupts are enabled. This will allow all in-flight
> memory
> + * conversions to finish cleanly before unsharing all memory.
> + */
> + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) &&
> kexec_in_progress)
> + x86_platform.guest.enc_kexec_unshare_mem(false);
These questions are coming from an incomplete understanding of the
kexec/reboot operation. Please disregard if it is not helpful.
By doing this while other tasks can still run, it handles the
conversion races in the !crash case. But then it sets shared pages to
NP. What happens if another active task tries to write to one?
I guess we rely on the kernel_restart_prepare()->device_shutdown() to
clean up, which runs before native_machine_shutdown(). So there might
be conversions in progress when tdx_kexec_unshare_mem() is called, from
the allocator work queues. But the actual memory won't be accessed
during that operation.
But the console must be active? Or otherwise who can see these
warnings. It doesn't use a shared page? Or the KVM clock, which looks
to clean up at cpu tear down, which now happens after
tdx_kexec_unshare_mem()? So I wonder if there might be cases.
If so, maybe you could halt the conversions in
native_machine_shutdown(), then do the actual reset to private after
tasks can't schedule. I'd still wonder about if anything might try to
access a shared page triggered by the console output.
> +
> /* Stop the cpus and apics */
> #ifdef CONFIG_X86_IO_APIC
> /*
More information about the kexec
mailing list