[PATCH v3 1/2] kho: add support for preserving vmalloc allocations

Pratyush Yadav me at yadavpratyush.com
Tue Sep 9 07:33:27 PDT 2025


Hi Mike,

Couple more thoughts.

On Mon, Sep 08 2025, Pratyush Yadav wrote:

> On Mon, Sep 08 2025, Mike Rapoport wrote:
>
>> From: "Mike Rapoport (Microsoft)" <rppt at kernel.org>
>>
>> A vmalloc allocation is preserved using binary structure similar to
>> global KHO memory tracker. It's a linked list of pages where each page
>> is an array of physical address of pages in vmalloc area.
>>
>> kho_preserve_vmalloc() hands out the physical address of the head page
>> to the caller. This address is used as the argument to
>> kho_vmalloc_restore() to restore the mapping in the vmalloc address
>> space and populate it with the preserved pages.
>>
>> Signed-off-by: Mike Rapoport (Microsoft) <rppt at kernel.org>
[...]
>> +/**
>> + * kho_restore_vmalloc - recreates and populates an area in vmalloc address
>> + * space from the preserved memory.
>> + * @preservation: physical address of the preservation metadata.
>> + *
>> + * Recreates an area in vmalloc address space and populates it with memory that
>> + * was preserved using kho_preserve_vmalloc().
>> + *
>> + * Return: pointer to the area in the vmalloc address space, NULL on failure.
>> + */
>> +void *kho_restore_vmalloc(phys_addr_t preservation)
>> +{
>> +	struct kho_vmalloc_chunk *chunk = phys_to_virt(preservation);
>> +	unsigned int align, order, shift, flags;
>> +	unsigned int idx = 0, nr;
>> +	unsigned long addr, size;
>> +	struct vm_struct *area;
>> +	struct page **pages;
>> +	int err;
>> +
>> +	flags = chunk->hdr.flags;
>> +	if (flags & ~KHO_VMALLOC_FLAGS_MASK)
>> +		return NULL;
>> +
>> +	nr = chunk->hdr.total_pages;
>> +	pages = kvmalloc_array(nr, sizeof(*pages), GFP_KERNEL);
>> +	if (!pages)
>> +		return NULL;
>> +	order = chunk->hdr.order;
>> +	shift = PAGE_SHIFT + order;
>> +	align = 1 << shift;
>> +
>> +	while (chunk) {
>> +		struct page *page;
>> +
>> +		for (int i = 0; i < chunk->hdr.num_elms; i++) {
>> +			phys_addr_t phys = chunk->phys[i];
>> +
>> +			for (int j = 0; j < (1 << order); j++) {
>> +				page = phys_to_page(phys);
>> +				kho_restore_page(page, 0);
>> +				pages[idx++] = page;
>
> This can buffer-overflow if the previous kernel was buggy and added too
> many pages. Perhaps keep check for this?

Thinking about this a bit more, I think this should check that we found
_exactly_ chunk->hdr.total_pages pages, and should error out otherwise.
If too few are found, pages array will contain bogus data, if too many,
buffer overflow.

Also, I am not a fan of using kho_restore_page() directly. I think the
vmalloc preservation is a layer above core KHO, and it should use the
public KHO APIs. It really doesn't need to poke into internal APIs. If
any of the public APIs are insufficient, we should add new ones.

I don't suppose I'd insist on it, but something to consider since you
are likely going to do another revision anyway.

>
>> +				phys += PAGE_SIZE;
>> +			}
>> +		}
>> +
>> +		page = virt_to_page(chunk);
>> +		chunk = KHOSER_LOAD_PTR(chunk->hdr.next);
>> +		kho_restore_page(page, 0);
>> +		__free_page(page);
>> +	}
>> +
>> +	area = __get_vm_area_node(nr * PAGE_SIZE, align, shift, flags,
>> +				  VMALLOC_START, VMALLOC_END, NUMA_NO_NODE,
>> +				  GFP_KERNEL, __builtin_return_address(0));
>> +	if (!area)
>> +		goto err_free_pages_array;
>> +
>> +	addr = (unsigned long)area->addr;
>> +	size = get_vm_area_size(area);
>> +	err = vmap_pages_range(addr, addr + size, PAGE_KERNEL, pages, shift);
>> +	if (err)
>> +		goto err_free_vm_area;
>> +
>> +	return area->addr;
>
> You should free the pages array before returning here.
>
>> +
>> +err_free_vm_area:
>> +	free_vm_area(area);
>> +err_free_pages_array:
>> +	kvfree(pages);
>> +	return NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(kho_restore_vmalloc);
>> +
>>  /* Handling for debug/kho/out */
>>  
>>  static struct dentry *debugfs_root;

-- 
Regards,
Pratyush Yadav



More information about the kexec mailing list