[PATCH 1/4] block: bio-integrity: add support for user buffers

Kanchan Joshi joshi.k at samsung.com
Wed Oct 25 05:51:55 PDT 2023


On 10/18/2023 8:48 PM, Keith Busch wrote:
> From: Keith Busch <kbusch at kernel.org>
> 
> User space passthrough commands that utilize metadata currently need to
> bounce the "integrity" buffer through the kernel. This adds unnecessary
> overhead and memory pressure.
> 
> Add support for mapping user space directly so that we can avoid this
> costly copy. This is similiar to how the bio payload utilizes user
> addresses with bio_map_user_iov().
> 
> Signed-off-by: Keith Busch <kbusch at kernel.org>
> ---
>   block/bio-integrity.c | 67 +++++++++++++++++++++++++++++++++++++++++++
>   include/linux/bio.h   |  8 ++++++
>   2 files changed, 75 insertions(+)
> 
> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index ec8ac8cf6e1b9..08f70b837a29b 100644
> --- a/block/bio-integrity.c
> +++ b/block/bio-integrity.c
> @@ -91,6 +91,19 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
>   }
>   EXPORT_SYMBOL(bio_integrity_alloc);
>   
> +static void bio_integrity_unmap_user(struct bio_integrity_payload *bip)
> +{
> +	bool dirty = bio_data_dir(bip->bip_bio) == READ;
> +	struct bvec_iter iter;
> +	struct bio_vec bv;
> +
> +	bip_for_each_vec(bv, bip, iter) {
> +		if (dirty && !PageCompound(bv.bv_page))
> +			set_page_dirty_lock(bv.bv_page);
> +		unpin_user_page(bv.bv_page);
> +	}
> +}
> +
>   /**
>    * bio_integrity_free - Free bio integrity payload
>    * @bio:	bio containing bip to be freed
> @@ -105,6 +118,8 @@ void bio_integrity_free(struct bio *bio)
>   
>   	if (bip->bip_flags & BIP_BLOCK_INTEGRITY)
>   		kfree(bvec_virt(bip->bip_vec));
> +	else if (bip->bip_flags & BIP_INTEGRITY_USER)
> +		bio_integrity_unmap_user(bip);;
>   
>   	__bio_integrity_free(bs, bip);
>   	bio->bi_integrity = NULL;
> @@ -160,6 +175,58 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
>   }
>   EXPORT_SYMBOL(bio_integrity_add_page);
>   
> +int bio_integrity_map_user(struct bio *bio, void __user *ubuf, unsigned int len,
> +			   u32 seed, u32 maxvecs)
> +{
> +	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> +	unsigned long align = q->dma_pad_mask | queue_dma_alignment(q);
> +	struct page *stack_pages[UIO_FASTIOV];
> +	size_t offset = offset_in_page(ubuf);
> +	unsigned long ptr = (uintptr_t)ubuf;
> +	struct page **pages = stack_pages;
> +	struct bio_integrity_payload *bip;
> +	int npages, ret, i;
> +
> +	if (bio_integrity(bio) || ptr & align || maxvecs > UIO_FASTIOV)
> +		return -EINVAL;
> +
> +	bip = bio_integrity_alloc(bio, GFP_KERNEL, maxvecs);
> +	if (IS_ERR(bip))
> +		return PTR_ERR(bip);
> +
> +	ret = pin_user_pages_fast(ptr, UIO_FASTIOV, FOLL_WRITE, pages);

Why not pass maxvecs here? If you pass UIO_FASTIOV, it will map those 
many pages here. And will result into a leak (missed unpin) eventually 
(see below).

> +	if (unlikely(ret < 0))
> +		goto free_bip;
> +
> +	npages = ret;
> +	for (i = 0; i < npages; i++) {
> +		u32 bytes = min_t(u32, len, PAGE_SIZE - offset);

Nit: bytes can be declared outside.

> +		ret = bio_integrity_add_page(bio, pages[i], bytes, offset);
> +		if (ret != bytes) {
> +			ret = -EINVAL;
> +			goto release_pages;
> +		}
> +		len -= ret;

Take the case of single '4KB + 8b' io.
This len will become 0 in the first iteration.
But the loop continues for UIO_FASTIOV iterations. It will add only one 
page into bio_integrity_add_page.

And that is what it will unpin during bio_integrity_unmap_user(). 
Remaining pages will continue to remain pinned.



More information about the Linux-nvme mailing list