[PATCH v5 09/16] kexec: enable KHO support for memory preservation
Jason Gunthorpe
jgg at nvidia.com
Thu Apr 3 09:10:01 PDT 2025
On Thu, Apr 03, 2025 at 03:50:04PM +0000, Pratyush Yadav wrote:
> The patch currently has a limitation where it does not free any of the
> empty tables after a unpreserve operation. But Changyuan's patch also
> doesn't do it so at least it is not any worse off.
We do we even have unpreserve? Just discard the entire KHO operation
in a bulk.
> When working on this patch, I realized that kho_mem_deserialize() is
> currently _very_ slow. It takes over 2 seconds to make memblock
> reservations for 48 GiB of 0-order pages. I suppose this can later be
> optimized by teaching memblock_free_all() to skip preserved pages
> instead of making memblock reservations.
Yes, this was my prior point of not having actual data to know what
the actual hot spots are.. This saves a few ms on an operation that
takes over 2 seconds :)
> +typedef unsigned long khomem_desc_t;
This should be more like:
union {
void *table;
phys_addr_t table_phys;
};
Since we are not using the low bits right now and it is alot cheaper
to convert from va to phys only once during the final step. __va is
not exactly fast.
> +#define PTRS_PER_LEVEL (PAGE_SIZE / sizeof(unsigned long))
> +#define KHOMEM_L1_BITS (PAGE_SIZE * BITS_PER_BYTE)
> +#define KHOMEM_L1_MASK ((1 << ilog2(KHOMEM_L1_BITS)) - 1)
> +#define KHOMEM_L1_SHIFT (PAGE_SHIFT)
> +#define KHOMEM_L2_SHIFT (KHOMEM_L1_SHIFT + ilog2(KHOMEM_L1_BITS))
> +#define KHOMEM_L3_SHIFT (KHOMEM_L2_SHIFT + ilog2(PTRS_PER_LEVEL))
> +#define KHOMEM_L4_SHIFT (KHOMEM_L3_SHIFT + ilog2(PTRS_PER_LEVEL))
> +#define KHOMEM_PFN_MASK PAGE_MASK
This all works better if you just use GENMASK and FIELD_GET
> +static int __khomem_table_alloc(khomem_desc_t *desc)
> +{
> + if (khomem_desc_none(*desc)) {
Needs READ_ONCE
> +struct kho_mem_track {
> + /* Points to L4 KHOMEM descriptor, each order gets its own table. */
> + struct xarray orders;
> +};
I think it would be easy to add a 5th level and just use bits 63:57 as
a 6 bit order. Then you don't need all this stuff either.
> +int kho_preserve_folio(struct folio *folio)
> +{
> + unsigned long pfn = folio_pfn(folio);
> + unsigned int order = folio_order(folio);
> + int err;
> +
> + if (!kho_enable)
> + return -EOPNOTSUPP;
> +
> + down_read(&kho_out.tree_lock);
This lock still needs to go away
> +static void kho_mem_serialize(void)
> +{
> + struct kho_mem_track *tracker = &kho_mem_track;
> + khomem_desc_t *desc;
> + unsigned long order;
> +
> + xa_for_each(&tracker->orders, order, desc) {
> + if (WARN_ON(order >= NR_PAGE_ORDERS))
> + break;
> + kho_out.mem_tables[order] = *desc;
Missing the virt_to_phys?
> + nr_tables = min_t(unsigned int, len / sizeof(*tables), NR_PAGE_ORDERS);
> + for (order = 0; order < nr_tables; order++)
> + khomem_walk_preserved((khomem_desc_t *)&tables[order], order,
Missing phys_to_virt
Please dont' remove the KHOSER stuff, and do use it with proper
structs and types. It is part of keeping this stuff understandable.
Jason
More information about the kexec
mailing list