[PATCH 10/13] PCI/P2PDMA: support compound page in p2pmem_alloc_mmap()
Logan Gunthorpe
logang at deltatee.com
Mon Dec 22 09:04:17 PST 2025
On 2025-12-19 21:04, Hou Tao wrote:
> From: Hou Tao <houtao1 at huawei.com>
>
> P2PDMA memory has already supported compound page and the helpers which
> support inserting compound page into vma is also ready, therefore, add
> support for compound page in p2pmem_alloc_mmap() as well. It will reduce
> the overhead of mmap() and get_user_pages() a lot when compound page is
> enabled for p2pdma memory.
>
> The use of vm_private_data to save the alignment of p2pdma memory needs
> explanation. The normal way to get the alignment is through pci_dev. It
> can be achieved by either invoking kernfs_of() and sysfs_file_kobj() or
> defining a new struct kernfs_vm_ops to pass the kobject to the
> may_split() and ->pagesize() callbacks. The former approach depends too
> much on kernfs implementation details, and the latter would lead to
> excessive churn. Therefore, choose the simpler way of saving alignment
> in vm_private_data instead.
>
> Signed-off-by: Hou Tao <houtao1 at huawei.com>
> ---
> drivers/pci/p2pdma.c | 48 ++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index e97f5da73458..4a133219ac43 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -128,6 +128,25 @@ static unsigned long p2pmem_get_unmapped_area(struct file *filp, struct kobject
> return mm_get_unmapped_area(filp, uaddr, len, pgoff, flags);
> }
>
> +static int p2pmem_may_split(struct vm_area_struct *vma, unsigned long addr)
> +{
> + size_t align = (uintptr_t)vma->vm_private_data;
> +
> + if (!IS_ALIGNED(addr, align))
> + return -EINVAL;
> + return 0;
> +}
> +
> +static unsigned long p2pmem_pagesize(struct vm_area_struct *vma)
> +{
> + return (uintptr_t)vma->vm_private_data;
> +}
> +
> +static const struct vm_operations_struct p2pmem_vm_ops = {
> + .may_split = p2pmem_may_split,
> + .pagesize = p2pmem_pagesize,
> +};
> +
> static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
> const struct bin_attribute *attr, struct vm_area_struct *vma)
> {
> @@ -136,6 +155,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
> struct pci_p2pdma *p2pdma;
> struct percpu_ref *ref;
> unsigned long vaddr;
> + size_t align;
> void *kaddr;
> int ret;
>
> @@ -161,6 +181,16 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
> goto out;
> }
>
> + align = p2pdma->align;
> + if (vma->vm_start & (align - 1) || vma->vm_end & (align - 1)) {
> + pci_info_ratelimited(pdev,
> + "%s: unaligned vma (%#lx~%#lx, %#lx)\n",
> + current->comm, vma->vm_start, vma->vm_end,
> + align);
> + ret = -EINVAL;
> + goto out;
> + }
I'm a bit confused by some aspects of these changes. Why does the
alignment become a property of the PCI device? It appears that if the
CPU supports different sized huge pages then the size and alignment
restrictions on P2PDMA memory become greater. So if someone is only
allocating a few KB these changes will break their code and refuse to
allocate single pages.
I would have expected this code to allocate an appropriately aligned
block of the p2p memory based on the requirements of the current
mapping, not based on alignment requirements established when the device
is probed.
Logan
More information about the Linux-nvme
mailing list