[PATCH 1/4] block: bio-integrity: add support for user buffers
Kanchan Joshi
joshi.k at samsung.com
Wed Oct 25 05:51:55 PDT 2023
On 10/18/2023 8:48 PM, Keith Busch wrote:
> From: Keith Busch <kbusch at kernel.org>
>
> User space passthrough commands that utilize metadata currently need to
> bounce the "integrity" buffer through the kernel. This adds unnecessary
> overhead and memory pressure.
>
> Add support for mapping user space directly so that we can avoid this
> costly copy. This is similiar to how the bio payload utilizes user
> addresses with bio_map_user_iov().
>
> Signed-off-by: Keith Busch <kbusch at kernel.org>
> ---
> block/bio-integrity.c | 67 +++++++++++++++++++++++++++++++++++++++++++
> include/linux/bio.h | 8 ++++++
> 2 files changed, 75 insertions(+)
>
> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index ec8ac8cf6e1b9..08f70b837a29b 100644
> --- a/block/bio-integrity.c
> +++ b/block/bio-integrity.c
> @@ -91,6 +91,19 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
> }
> EXPORT_SYMBOL(bio_integrity_alloc);
>
> +static void bio_integrity_unmap_user(struct bio_integrity_payload *bip)
> +{
> + bool dirty = bio_data_dir(bip->bip_bio) == READ;
> + struct bvec_iter iter;
> + struct bio_vec bv;
> +
> + bip_for_each_vec(bv, bip, iter) {
> + if (dirty && !PageCompound(bv.bv_page))
> + set_page_dirty_lock(bv.bv_page);
> + unpin_user_page(bv.bv_page);
> + }
> +}
> +
> /**
> * bio_integrity_free - Free bio integrity payload
> * @bio: bio containing bip to be freed
> @@ -105,6 +118,8 @@ void bio_integrity_free(struct bio *bio)
>
> if (bip->bip_flags & BIP_BLOCK_INTEGRITY)
> kfree(bvec_virt(bip->bip_vec));
> + else if (bip->bip_flags & BIP_INTEGRITY_USER)
> + bio_integrity_unmap_user(bip);;
>
> __bio_integrity_free(bs, bip);
> bio->bi_integrity = NULL;
> @@ -160,6 +175,58 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
> }
> EXPORT_SYMBOL(bio_integrity_add_page);
>
> +int bio_integrity_map_user(struct bio *bio, void __user *ubuf, unsigned int len,
> + u32 seed, u32 maxvecs)
> +{
> + struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> + unsigned long align = q->dma_pad_mask | queue_dma_alignment(q);
> + struct page *stack_pages[UIO_FASTIOV];
> + size_t offset = offset_in_page(ubuf);
> + unsigned long ptr = (uintptr_t)ubuf;
> + struct page **pages = stack_pages;
> + struct bio_integrity_payload *bip;
> + int npages, ret, i;
> +
> + if (bio_integrity(bio) || ptr & align || maxvecs > UIO_FASTIOV)
> + return -EINVAL;
> +
> + bip = bio_integrity_alloc(bio, GFP_KERNEL, maxvecs);
> + if (IS_ERR(bip))
> + return PTR_ERR(bip);
> +
> + ret = pin_user_pages_fast(ptr, UIO_FASTIOV, FOLL_WRITE, pages);
Why not pass maxvecs here? If you pass UIO_FASTIOV, it will map those
many pages here. And will result into a leak (missed unpin) eventually
(see below).
> + if (unlikely(ret < 0))
> + goto free_bip;
> +
> + npages = ret;
> + for (i = 0; i < npages; i++) {
> + u32 bytes = min_t(u32, len, PAGE_SIZE - offset);
Nit: bytes can be declared outside.
> + ret = bio_integrity_add_page(bio, pages[i], bytes, offset);
> + if (ret != bytes) {
> + ret = -EINVAL;
> + goto release_pages;
> + }
> + len -= ret;
Take the case of single '4KB + 8b' io.
This len will become 0 in the first iteration.
But the loop continues for UIO_FASTIOV iterations. It will add only one
page into bio_integrity_add_page.
And that is what it will unpin during bio_integrity_unmap_user().
Remaining pages will continue to remain pinned.
More information about the Linux-nvme
mailing list