[RFC PATCH v2 0/7] Introduce persistent memory pool
Stanislav Kinsburskii
skinsburskii at linux.microsoft.com
Wed Sep 27 15:44:45 PDT 2023
On Thu, Sep 28, 2023 at 06:25:44PM +0800, Baoquan He wrote:
> On 09/27/23 at 09:13am, Stanislav Kinsburskii wrote:
> > On Wed, Sep 27, 2023 at 01:44:38PM +0800, Baoquan He wrote:
> > > Hi Stanislav,
> > >
> > > On 09/25/23 at 02:27pm, Stanislav Kinsburskii wrote:
> > > > This patch introduces a memory allocator specifically tailored for
> > > > persistent memory within the kernel. The allocator maintains
> > > > kernel-specific states like DMA passthrough device states, IOMMU state, and
> > > > more across kexec.
> > >
> > > Can you give more details about how this persistent memory pool will be
> > > utilized in a actual scenario? I mean, what problem have you met so that
> > > you have to introduce persistent memory pool to solve it?
> > >
> >
> > The major reason we have at the moment, is that Linux root partition
> > running on top of the Microsoft hypervisor needs to deposit pages to
> > hypervisor in runtime, when hypervisor runs out of memory.
> > "Depositing" here means, that Linux passes a set of its PFNs to the
> > hypervisor via hypercall, and hypervisor then uses these pages for its
> > own needs.
> >
> > Once deposited, these pages can't be accessed by Linux anymore and thus
> > must be preserved in "used" state across kexec, as hypervisor state is
> > unware of kexec. In the same time, these pages can we withdrawn when
> > usused. Thus, an allocator persistent across kexec looks reasonable for
> > this particular matter.
>
> Thanks for these details.
>
> The deposit and withdraw remind me the Balloon driver, David's virtio-mem,
> DLPAR on ppc which can hot increasing or shrinking phisical memory on guest
> OS. Can't microsoft hypervisor do the similar thing to reclaim or give
> back the memory from or to the 'Linux root partition' running on top of
> the hypervisor?
>
Although Microsoft hypervisor is a type 1 hypervisor and runs on the
physical hardware, like Xen, it doens't control all the memory, but is
rather granted with memory by either boot loader or by Linux root
partition (similar priveleged VM is called "Dom0" in Xen world). IOW,
this works in the oposite direction: Linux gives memory to hypervisor,
and can reclaim it back. However, doing so on kexec increases downtime
as withdrawn pages must be deposited back again after booting to restore
the guests ("DomU" in Xen terminology).
It worth mentionining, that the "deposited pages" in this context don't
mean guest pages, but the pages required by the hypevisor to store Linux
root partition state user to control guest partitions.
Also, pages reclaim is not possible, if guests are left running during
kexec, as hypervisor requires to keep the Linux root partition-related
state intact to keep the guest state consistent.
> Thanks
> Baoquan
More information about the kexec
mailing list