[PATCH for-next v7 4/5] block: add helper to map bvec iterator for passthrough

Kanchan Joshi joshi.k at samsung.com
Thu Sep 22 08:23:31 PDT 2022


On Tue, Sep 20, 2022 at 02:08:02PM +0200, Christoph Hellwig wrote:
>> -static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
>> +static struct bio *bio_map_get(struct request *rq, unsigned int nr_vecs,
>>  		gfp_t gfp_mask)
>
>bio_map_get is a very confusing name.

So I chose that name because functionality is opposite of what we do
inside existing bio_map_put helper. In that way it is symmetric.

>And I also still think this is
>the wrong way to go.  If plain slab allocations don't use proper
>per-cpu caches we have a MM problem and need to talk to the slab
>maintainers and not use the overkill bio_set here.

This series is not about using (or not using) bio-set. Attempt here has
been to use pre-mapped buffers (and bvec) that we got from io_uring.

>> +/* Prepare bio for passthrough IO given an existing bvec iter */
>> +int blk_rq_map_user_bvec(struct request *rq, struct iov_iter *iter)
>
>I'm a little confused about the interface we're trying to present from
>the block layer to the driver here.
>
>blk_rq_map_user_iov really should be able to detect that it is called
>on a bvec iter and just do the right thing rather than needing different
>helpers.

I too explored that possibility, but found that it does not. It maps the
user-pages into bio either directly or by doing that copy (in certain odd
conditions) but does not know how to deal with existing bvec.
Reason, I guess, is no one felt the need to try passthrough for bvecs
before. It makes sense only in context of io_uring passthrough.
And it really felt cleaner to me write a new function rather than 
overloading the blk_rq_map_user_iov with multiple if/else canals.
I tried that again after your comment, but it does not seem to produce
any good-looking code.
The other factor is - it seemed safe to go this way as I am more sure
that I will not break something else (using blk_rq_map_user_iov).

>> +		/*
>> +		 * If the queue doesn't support SG gaps and adding this
>> +		 * offset would create a gap, disallow it.
>> +		 */
>> +		if (bvprvp && bvec_gap_to_prev(lim, bvprvp, bv->bv_offset))
>> +			goto out_err;
>
>So now you limit the input that is accepted?  That's not really how
>iov_iters are used.   We can either try to reshuffle the bvecs, or
>just fall back to the copy data version as blk_rq_map_user_iov does
>for 'weird' iters˙

Since I was writing a 'new' helper for passthrough only, I thought it
will not too bad to just bail out (rather than try to handle it using
copy) if we hit this queue_virt_boundary related situation. 

To handle it the 'copy data' way we would need this -

585         else if (queue_virt_boundary(q))
586                 copy = queue_virt_boundary(q) & iov_iter_gap_alignment(iter);
587

But iov_iter_gap_alignment does not work on bvec iters. Line #1274 below

1264 unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
1265 {
1266         unsigned long res = 0;
1267         unsigned long v = 0;
1268         size_t size = i->count;
1269         unsigned k;
1270
1271         if (iter_is_ubuf(i))
1272                 return 0;
1273
1274         if (WARN_ON(!iter_is_iovec(i)))
1275                 return ~0U;

Do you see a way to overcome this. Or maybe this can be revisted as we
are not missing a lot?

>> +
>> +		/* check full condition */
>> +		if (nsegs >= nr_segs || bytes > UINT_MAX - bv->bv_len)
>> +			goto out_err;
>> +
>> +		if (bytes + bv->bv_len <= nr_iter &&
>> +				bv->bv_offset + bv->bv_len <= PAGE_SIZE) {
>> +			nsegs++;
>> +			bytes += bv->bv_len;
>> +		} else
>> +			goto out_err;
>
>Nit: This would read much better as:
>
>		if (bytes + bv->bv_len > nr_iter)
>			goto out_err;
>		if (bv->bv_offset + bv->bv_len > PAGE_SIZE)
>			goto out_err;
>
>		bytes += bv->bv_len;
>		nsegs++;

Indeed, cleaner. Thanks.


More information about the Linux-nvme mailing list