[PATCH V2 4/4] arm/xen: Read extended regions from DT and init Xen resource
Stefano Stabellini
sstabellini at kernel.org
Thu Nov 18 17:19:48 PST 2021
On Wed, 10 Nov 2021, Oleksandr wrote:
> On 28.10.21 04:40, Stefano Stabellini wrote:
>
> Hi Stefano
>
> I am sorry for the late response.
>
> > On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko at epam.com>
> > >
> > > This patch implements arch_xen_unpopulated_init() on Arm where
> > > the extended regions (if any) are gathered from DT and inserted
> > > into passed Xen resource to be used as unused address space
> > > for Xen scratch pages by unpopulated-alloc code.
> > >
> > > The extended region (safe range) is a region of guest physical
> > > address space which is unused and could be safely used to create
> > > grant/foreign mappings instead of wasting real RAM pages from
> > > the domain memory for establishing these mappings.
> > >
> > > The extended regions are chosen by the hypervisor at the domain
> > > creation time and advertised to it via "reg" property under
> > > hypervisor node in the guest device-tree. As region 0 is reserved
> > > for grant table space (always present), the indexes for extended
> > > regions are 1...N.
> > >
> > > If arch_xen_unpopulated_init() fails for some reason the default
> > > behaviour will be restored (allocate xenballooned pages).
> > >
> > > This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
> > >
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko at epam.com>
> > > ---
> > > Changes RFC -> V2:
> > > - new patch, instead of
> > > "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide
> > > unallocated space"
> > > ---
> > > arch/arm/xen/enlighten.c | 112
> > > +++++++++++++++++++++++++++++++++++++++++++++++
> > > drivers/xen/Kconfig | 2 +-
> > > 2 files changed, 113 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > > index dea46ec..1a1e0d3 100644
> > > --- a/arch/arm/xen/enlighten.c
> > > +++ b/arch/arm/xen/enlighten.c
> > > @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
> > > static phys_addr_t xen_grant_frames;
> > > #define GRANT_TABLE_INDEX 0
> > > +#define EXT_REGION_INDEX 1
> > > uint32_t xen_start_flags;
> > > EXPORT_SYMBOL(xen_start_flags);
> > > @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
> > > #endif
> > > }
> > > +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> > > +int arch_xen_unpopulated_init(struct resource *res)
> > > +{
> > > + struct device_node *np;
> > > + struct resource *regs, *tmp_res;
> > > + uint64_t min_gpaddr = -1, max_gpaddr = 0;
> > > + unsigned int i, nr_reg = 0;
> > > + struct range mhp_range;
> > > + int rc;
> > > +
> > > + if (!xen_domain())
> > > + return -ENODEV;
> > > +
> > > + np = of_find_compatible_node(NULL, NULL, "xen,xen");
> > > + if (WARN_ON(!np))
> > > + return -ENODEV;
> > > +
> > > + /* Skip region 0 which is reserved for grant table space */
> > > + while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
> > > + nr_reg++;
> > > + if (!nr_reg) {
> > > + pr_err("No extended regions are found\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
> > > + if (!regs)
> > > + return -ENOMEM;
> > > +
> > > + /*
> > > + * Create resource from extended regions provided by the hypervisor to
> > > be
> > > + * used as unused address space for Xen scratch pages.
> > > + */
> > > + for (i = 0; i < nr_reg; i++) {
> > > + rc = of_address_to_resource(np, i + EXT_REGION_INDEX,
> > > ®s[i]);
> > > + if (rc)
> > > + goto err;
> > > +
> > > + if (max_gpaddr < regs[i].end)
> > > + max_gpaddr = regs[i].end;
> > > + if (min_gpaddr > regs[i].start)
> > > + min_gpaddr = regs[i].start;
> > > + }
> > > +
> > > + /* Check whether the resource range is within the hotpluggable range
> > > */
> > > + mhp_range = mhp_get_pluggable_range(true);
> > > + if (min_gpaddr < mhp_range.start)
> > > + min_gpaddr = mhp_range.start;
> > > + if (max_gpaddr > mhp_range.end)
> > > + max_gpaddr = mhp_range.end;
> > > +
> > > + res->start = min_gpaddr;
> > > + res->end = max_gpaddr;
> > > +
> > > + /*
> > > + * Mark holes between extended regions as unavailable. The rest of
> > > that
> > > + * address space will be available for the allocation.
> > > + */
> > > + for (i = 1; i < nr_reg; i++) {
> > > + resource_size_t start, end;
> > > +
> > > + start = regs[i - 1].end + 1;
> > > + end = regs[i].start - 1;
> > > +
> > > + if (start > (end + 1)) {
> > Should this be:
> >
> > if (start >= end)
> >
> > ?
>
> Yes, we can do this here (since the checks are equivalent) but ...
>
> > > + rc = -EINVAL;
> > > + goto err;
> > > + }
> > > +
> > > + /* There is no hole between regions */
> > > + if (start == (end + 1))
> > Also here, shouldn't it be:
> >
> > if (start == end)
> >
> > ?
>
> ... not here.
>
> As
>
> "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 == regs[i].start)"
>
> but
>
> "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"
OK. But the check:
if (start >= end)
Actually covers both cases so that's the only check we need?
> >
> > I think I am missing again something in termination accounting :-)
>
> If I understand correctly, we need to follow "end = start + size - 1" rule, so
> the "end" is the last address inside a range, but not the "first" address
> outside of a range))
yeah
> > > + continue;
> > > +
> > > + /* Check whether the hole range is within the resource range
> > > */
> > > + if (start < res->start || end > res->end) {
> > By definition I don't think this check is necessary as either condition
> > is impossible?
>
>
> This is a good question, let me please explain.
> Not all extended regions provided by the hypervisor can be used here. This is
> because the addressable physical memory range for which the linear mapping
> could be created has limits on Arm, and maximum addressable range depends on
> the VA space size (CONFIG_ARM64_VA_BITS_XXX). So we decided to not filter them
> in hypervisor as this logic could be quite complex as different OS may have
> different requirement, etc. This means that we need to make sure that regions
> are within the hotpluggable range to avoid a failure later on when a region is
> pre-validated by the memory hotplug path.
>
> The following code limits the resource range based on that:
>
> + /* Check whether the resource range is within the hotpluggable range */
> + mhp_range = mhp_get_pluggable_range(true);
> + if (min_gpaddr < mhp_range.start)
> + min_gpaddr = mhp_range.start;
> + if (max_gpaddr > mhp_range.end)
> + max_gpaddr = mhp_range.end;
> +
> + res->start = min_gpaddr;
> + res->end = max_gpaddr;
>
> In current loop (when calculating and inserting holes) we also need to make
> sure that resulting hole range is within the resource range (and adjust/skip
> it if not true) as regs[] used for the calculations contains raw regions as
> they described in DT so not updated. Otherwise insert_resource() down the
> function will return an error for the conflicting operations. Yes, I could
> took a different route and update regs[] in advance to adjust/skip
> non-suitable regions in front, but I decided to do it on the fly in the loop
> here, I thought doing it in advance would add some overhead/complexity. What
> do you think?
I understand now.
> So I am afraid this check is necessary here.
>
> For example in my environment the extended regions are:
>
> (XEN) Extended region 0: 0->0x8000000
> (XEN) Extended region 1: 0xc000000->0x30000000
> (XEN) Extended region 2: 0x40000000->0x47e00000
> (XEN) Extended region 3: 0xd0000000->0xe6000000
> (XEN) Extended region 4: 0xe7800000->0xec000000
> (XEN) Extended region 5: 0xf1200000->0xfd000000
> (XEN) Extended region 6: 0x100000000->0x500000000
> (XEN) Extended region 7: 0x580000000->0x600000000
> (XEN) Extended region 8: 0x680000000->0x700000000
> (XEN) Extended region 9: 0x780000000->0x10000000000
>
> *With* the check the holes are:
>
> holes [47e00000 - cfffffff]
> holes [e6000000 - e77fffff]
> holes [ec000000 - f11fffff]
> holes [fd000000 - ffffffff]
> holes [500000000 - 57fffffff]
> holes [600000000 - 67fffffff]
> holes [700000000 - 77fffffff]
>
> And they seem to look correct, you can see that two possible holes between
> extended regions 0-1 (8000000-bffffff) and 1-2 (30000000-3fffffff) were
> skipped as they entirely located below res->start
> which is 0x40000000 in my case (48-bit VA: 0x40000000 - 0x80003fffffff).
>
> *Without* the check these two holes won't be skipped and as the result
> insert_resource() will fail.
>
>
> **********
>
>
> I have one idea how we can simplify filter logic, we can drop all checks here
> (including confusing one) in Arm code and update common code a bit:
>
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 1a1e0d3..ed5b855 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -311,7 +311,6 @@ int arch_xen_unpopulated_init(struct resource *res)
> struct resource *regs, *tmp_res;
> uint64_t min_gpaddr = -1, max_gpaddr = 0;
> unsigned int i, nr_reg = 0;
> - struct range mhp_range;
> int rc;
>
> if (!xen_domain())
> @@ -349,13 +348,6 @@ int arch_xen_unpopulated_init(struct resource *res)
> min_gpaddr = regs[i].start;
> }
>
> - /* Check whether the resource range is within the hotpluggable range
> */
> - mhp_range = mhp_get_pluggable_range(true);
> - if (min_gpaddr < mhp_range.start)
> - min_gpaddr = mhp_range.start;
> - if (max_gpaddr > mhp_range.end)
> - max_gpaddr = mhp_range.end;
> -
> res->start = min_gpaddr;
> res->end = max_gpaddr;
>
> @@ -378,17 +370,6 @@ int arch_xen_unpopulated_init(struct resource *res)
> if (start == (end + 1))
> continue;
>
> - /* Check whether the hole range is within the resource range
> */
> - if (start < res->start || end > res->end) {
> - if (start < res->start)
> - start = res->start;
> - if (end > res->end)
> - end = res->end;
> -
> - if (start >= (end + 1))
> - continue;
> - }
> -
> tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
> if (!tmp_res) {
> rc = -ENOMEM;
> diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
> index 1f1d8d8..a5d3ebb 100644
> --- a/drivers/xen/unpopulated-alloc.c
> +++ b/drivers/xen/unpopulated-alloc.c
> @@ -39,6 +39,7 @@ static int fill_list(unsigned int nr_pages)
> void *vaddr;
> unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
> int ret;
> + struct range mhp_range;
>
> res = kzalloc(sizeof(*res), GFP_KERNEL);
> if (!res)
> @@ -47,8 +48,10 @@ static int fill_list(unsigned int nr_pages)
> res->name = "Xen scratch";
> res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>
> + mhp_range = mhp_get_pluggable_range(true);
> +
> ret = allocate_resource(target_resource, res,
> - alloc_pages * PAGE_SIZE, 0, -1,
> + alloc_pages * PAGE_SIZE, mhp_range.start,
> mhp_range.end,
> PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
> if (ret < 0) {
> pr_err("Cannot allocate new IOMEM resource\n");
> (END)
>
> I believe, this will work on x86 as arch_get_mappable_range() is not
> implemented there,
> and the default option contains exactly what being used currently (0, -1).
>
> struct range __weak arch_get_mappable_range(void)
> {
> struct range mhp_range = {
> .start = 0UL,
> .end = -1ULL,
> };
> return mhp_range;
> }
>
> And this is going to be more generic and clear, what do you think?
Yeah this is much better, good thinking!
More information about the linux-arm-kernel
mailing list