[Qemu-devel] [Question] Memory hotplug clarification for Qemu ARM/virt

Igor Mammedov imammedo at redhat.com
Fri May 10 08:05:01 PDT 2019


On Fri, 10 May 2019 11:58:38 +0200
Auger Eric <eric.auger at redhat.com> wrote:

> Hi Shameer,
> 
> On 5/10/19 11:27 AM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> > 
> >> -----Original Message-----
> >> From: Auger Eric [mailto:eric.auger at redhat.com]
> >> Sent: 10 May 2019 10:16
> >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi at huawei.com>;
> >> Laszlo Ersek <lersek at redhat.com>; Igor Mammedov
> >> <imammedo at redhat.com>
> >> Cc: peter.maydell at linaro.org; xuwei (O) <xuwei5 at huawei.com>; Anshuman
> >> Khandual <anshuman.khandual at arm.com>; Catalin Marinas
> >> <Catalin.Marinas at arm.com>; ard.biesheuvel at linaro.org;
> >> will.deacon at arm.com; qemu-devel at nongnu.org; Linuxarm
> >> <linuxarm at huawei.com>; linux-mm <linux-mm at kvack.org>;
> >> qemu-arm at nongnu.org; Jonathan Cameron
> >> <jonathan.cameron at huawei.com>; Robin Murphy <robin.murphy at arm.com>;
> >> linux-arm-kernel at lists.infradead.org
> >> Subject: Re: [Qemu-devel] [Question] Memory hotplug clarification for Qemu
> >> ARM/virt
> >>
> >> Hi Shameer,
> >>
> >> On 5/10/19 10:34 AM, Shameerali Kolothum Thodi wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Laszlo Ersek [mailto:lersek at redhat.com]
> >>>> Sent: 09 May 2019 22:48
> >>>> To: Igor Mammedov <imammedo at redhat.com>
> >>>> Cc: Robin Murphy <robin.murphy at arm.com>; Shameerali Kolothum Thodi
> >>>> <shameerali.kolothum.thodi at huawei.com>; will.deacon at arm.com; Catalin
> >>>> Marinas <Catalin.Marinas at arm.com>; Anshuman Khandual
> >>>> <anshuman.khandual at arm.com>; linux-arm-kernel at lists.infradead.org;
> >>>> linux-mm <linux-mm at kvack.org>; qemu-devel at nongnu.org;
> >>>> qemu-arm at nongnu.org; eric.auger at redhat.com;
> >> peter.maydell at linaro.org;
> >>>> Linuxarm <linuxarm at huawei.com>; ard.biesheuvel at linaro.org; Jonathan
> >>>> Cameron <jonathan.cameron at huawei.com>; xuwei (O)
> >> <xuwei5 at huawei.com>
> >>>> Subject: Re: [Question] Memory hotplug clarification for Qemu ARM/virt
> >>>>
> >>>> On 05/09/19 18:35, Igor Mammedov wrote:
> >>>>> On Wed, 8 May 2019 22:26:12 +0200
> >>>>> Laszlo Ersek <lersek at redhat.com> wrote:
> >>>>>
> >>>>>> On 05/08/19 14:50, Robin Murphy wrote:
> >>>>>>> Hi Shameer,
> >>>>>>>
> >>>>>>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> This series here[0] attempts to add support for PCDIMM in QEMU for
> >>>>>>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
> >>>>>>>> least
> >>>>>>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
> >>>>>>>> memory is handled by kernel.
> >>>>>>>>
> >>>>>>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
> >>>>>>>> and uses GED device to trigger the hotplug. This works fine.
> >>>>>>>>
> >>>>>>>> But when we added the DT node corresponding to the PCDIMM(cold
> >> plug
> >>>>>>>> scenario), we noticed that Guest kernel see this memory during early
> >>>> boot
> >>>>>>>> even if we are booting with ACPI. Because of this, hotpluggable
> >> memory
> >>>>>>>> may end up in zone normal and make it non-hot-un-pluggable even if
> >>>> Guest
> >>>>>>>> boots with ACPI.
> >>>>>>>>
> >>>>>>>> Further discussions[1] revealed that, EDK2 UEFI has no means to
> >>>>>>>> interpret the
> >>>>>>>> ACPI content from Qemu(this is designed to do so) and uses DT info to
> >>>>>>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
> >>>>>>>> property
> >>>>>>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
> >>>>>>>> differentiate
> >>>>>>>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
> >>>>>>>>
> >>>>>>>> But then Laszlo rightly pointed out that in order to accommodate the
> >>>>>>>> changes
> >>>>>>>> into UEFI we need to know how exactly Linux expects/handles all the
> >>>>>>>> hotpluggable memory scenarios. Please find the discussion here[2].
> >>>>>>>>
> >>>>>>>> For ease, I am just copying the relevant comment from Laszlo below,
> >>>>>>>>
> >>>>>>>> /******
> >>>>>>>> "Given patches #7 and #8, as I understand them, the firmware cannot
> >>>>>>>> distinguish
> >>>>>>>>   hotpluggable & present, from hotpluggable & absent. The firmware
> >>>> can
> >>>>>>>> only
> >>>>>>>>   skip both hotpluggable cases. That's fine in that the firmware will
> >>>>>>>> hog neither
> >>>>>>>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
> >>>>>>>> boot?
> >>>>>>>>
> >>>>>>>> Consider in particular the "hotpluggable & present, ACPI boot" case.
> >>>>>>>> Assuming
> >>>>>>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI
> >>>> memmap
> >>>>>>>> will not include the range despite it being present at boot.
> >>>>>>>> Presumably, ACPI
> >>>>>>>> will refer to the range somehow, however. Will that not confuse the
> >> OS?
> >>>>>>>>
> >>>>>>>> When Igor raised this earlier, I suggested that
> >>>>>>>> hotpluggable-and-present should
> >>>>>>>> be added by the firmware, but also allocated immediately, as
> >>>>>>>> EfiBootServicesData
> >>>>>>>> type memory. This will prevent other drivers in the firmware from
> >>>>>>>> allocating AcpiNVS
> >>>>>>>> or Reserved chunks from the same memory range, the UEFI memmap
> >> will
> >>>>>>>> contain
> >>>>>>>> the range as EfiBootServicesData, and then the OS can release that
> >>>>>>>> allocation in
> >>>>>>>> one go early during boot.
> >>>>>>>>
> >>>>>>>> But this really has to be clarified from the Linux kernel's
> >>>>>>>> expectations. Please
> >>>>>>>> formalize all of the following cases:
> >>>>>>>>
> >>>>>>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should
> >> report
> >>>>>>>> as  DT/ACPI should report as
> >>>>>>>> -----------------  ------------------
> >>>>>>>> -------------------------------  ------------------------
> >>>>>>>>
> >>>> DT                 present             ?
> >>>>               ?
> >>>>>>>>
> >>>> DT                 absent              ?
> >>>>                ?
> >>>>>>>>
> >>>> ACPI               present             ?
> >>>>               ?
> >>>>>>>>
> >>>> ACPI               absent              ?
> >>>>               ?
> >>>>>>>>
> >>>>>>>> Again, this table is dictated by Linux."
> >>>>>>>>
> >>>>>>>> ******/
> >>>>>>>>
> >>>>>>>> Could you please take a look at this and let us know what is expected
> >>>>>>>> here from
> >>>>>>>> a Linux kernel view point.
> >>>>>>>
> >>>>>>> For arm64, so far we've not even been considering DT-based hotplug - as
> >>>>>>> far as I'm aware there would still be a big open question there around
> >>>>>>> notification mechanisms and how to describe them. The DT stuff so far
> >>>>>>> has come from the PowerPC folks, so it's probably worth seeing what
> >>>>>>> their ideas are.
> >>>>>>>
> >>>>>>> ACPI-wise I've always assumed/hoped that hotplug-related things
> >> should
> >>>>>>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
> >>>>>>> would be enough for us.
> >>>>>>
> >>>>>> As far as I can see in UEFI v2.8 -- and I had checked the spec before
> >>>>>> dumping the table with the many question marks on Shameer --, all the
> >>>>>> hot-plug language in the spec refers to USB and PCI hot-plug in the
> >>>>>> preboot environment. There is not a single word about hot-plug at OS
> >>>>>> runtime (regarding any device or component type), nor about memory
> >>>>>> hot-plug (at any time).
> >>>>>>
> >>>>>> Looking to x86 appears valid -- so what does the Linux kernel expect on
> >>>>>> that architecture, in the "ACPI" rows of the table?
> >>>>>
> >>>>> I could only answer from QEMU x86 perspective.
> >>>>> QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820
> >>>>> because of different linux guests tend to cannibalize it, making it non
> >>>>> unpluggable. The last culprit I recall was KASLR.
> >>>>>
> >>>>> So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if
> >>>>> it's possible (it's probably hack (spec deosn't say anything about it)
> >>>>> but it mostly works for Linux (plug/unplug) and Windows guest also
> >>>>> fine with plug part (no unplug there)).
> >>>>
> >>>> I can accept this as a perfectly valid design. Which would mean, QEMU
> >> should
> >>>> mark each hotpluggable RAM range in the DTB for the firmware with the
> >>>> special new property, regardless of its initial ("cold") plugged-ness, and then
> >>>> the firmware will not expose the range in the GCD memory space map, and
> >>>> consequently in the UEFI memmap either.
> >>>>
> >>>> IOW, our table is, thus far:
> >>>>
> >>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as
> >>>> DT/ACPI should report as
> >>>> -----------------  ------------------  -------------------------------  ------------------------
> >>>> DT                 present
> >>>> ABSENT                           ?
> >>>> DT                 absent
> >>>> ABSENT                           ?
> >>>> ACPI               present             ABSENT
> >>>> PRESENT
> >>>> ACPI               absent              ABSENT
> >>>> ABSENT
> >>>> In the firmware, I only need to care about the GetMemoryMap() column, so
> >> I
> >>>> can work with this.
> >>>
> >>> Thank you all for the inputs.
> >>>
> >>> I assume we will still report the DT cold plug case to kernel(hotpluggable &
> >> present).
> >>> so the table will be something like this,
> >>>
> >>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as
> >> DT/ACPI should report as
> >>> -----------------  ------------------  -------------------------------  ------------------------
> >>> DT                 present             ABSENT
> >> PRESENT
> >>> DT                 absent              ABSENT
> >> ABSENT
> >> With DT boot, how does the OS get to know if thehotpluggable memory is
> >> present or absent? Or maybe I misunderstand the last column.
> > 
> > It doesn't. For hotpluggable & present case it will be just like normal memory and
> > for absent case no memory node(hotpluaggble) is populated in DT. Is this acceptable?
> OK I get it now. Yes it makes sense.
> > 
> > On another note, if there are no strong case for DT cold plug for PCDIMM we can drop
> > it altogether which will make everything much simpler and no change required for
> > UEFI as well.
> I don't think we have strong requirements for PCDIMM in DT mode (initial
> RAM can be used). As long as we can detect an attempt to use PCDIMM in
> DT only mode and reject it (-no-acpi or !firmware_loaded ?), personally
> I don't have any objection.
It seems we are in agreement here, let's skip DT part for now.
We can add it later if there is demand for it (I don't see any issues wrt migration here).

> 
> Thanks
> 
> Eric
> > 
> > Thanks,
> > Shameer
> > 
> > 
> >> Thanks
> >>
> >> Eric
> >>> ACPI               present             ABSENT
> >> PRESENT
> >>> ACPI               absent              ABSENT
> >> ABSENT
> >>>
> >>>
> >>>  Can someone please file a feature request at
> >>>> <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these
> >>>> detais?
> >>>
> >>> Ok. I will do that.
> >>>
> >>> Thanks,
> >>> Shameer
> >>>
> >>>> Thanks
> >>>> Laszlo
> >>>>
> >>>>>
> >>>>> As for physical systems, there are out there ones that do report
> >>>>> hotpluggable RAM in GetMemoryMap().
> >>>>>
> >>>>>> Shameer: if you (Huawei) are represented on the USWG / ASWG, I
> >> suggest
> >>>>>> re-raising the question on those lists too; at least the "ACPI" rows of
> >>>>>> the table.
> >>>>>>
> >>>>>> Thanks!
> >>>>>> Laszlo
> >>>>>>
> >>>>>>>
> >>>>>>> Robin.
> >>>>>>>
> >>>>>>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed
> >>>>>>>> any valid
> >>>>>>>> points above).
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Shameer
> >>>>>>>> [0] https://patchwork.kernel.org/cover/10890919/
> >>>>>>>> [1] https://patchwork.kernel.org/patch/10863299/
> >>>>>>>> [2] https://patchwork.kernel.org/patch/10890937/
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>




More information about the linux-arm-kernel mailing list