[PATCH for-next v7 4/5] block: add helper to map bvec iterator for passthrough

Kanchan Joshi joshi.k at samsung.com
Sun Sep 25 10:46:29 PDT 2022


On Sat, Sep 24, 2022 at 12:13:49AM +0530, Kanchan Joshi wrote:
>On Fri, Sep 23, 2022 at 05:29:41PM +0200, Christoph Hellwig wrote:
>>On Thu, Sep 22, 2022 at 08:53:31PM +0530, Kanchan Joshi wrote:
>>>>blk_rq_map_user_iov really should be able to detect that it is called
>>>>on a bvec iter and just do the right thing rather than needing different
>>>>helpers.
>>>
>>>I too explored that possibility, but found that it does not. It maps the
>>>user-pages into bio either directly or by doing that copy (in certain odd
>>>conditions) but does not know how to deal with existing bvec.
>>
>>What do you mean with existing bvec?  We allocate a brand new bio here
>>that we want to map the next chunk of the iov_iter to, and that
>>is exactly what blk_rq_map_user_iov does.  What blk_rq_map_user_iov
>>currently does not do is to implement this mapping efficiently
>>for ITER_BVEC iters
>
>It is clear that it was not written for ITER_BVEC iters.
>Otherwise that WARN_ON would not have hit.
>
>And efficency is the concern as we are moving to more heavyweight
>helper that 'handles' weird conditions rather than just 'bails out'.
>These alignment checks end up adding a loop that traverses
>the entire ITER_BVEC.
>Also blk_rq_map_user_iov uses bio_iter_advance which also seems
>cycle-consuming given below code-comment in io_import_fixed():
>
>if (offset) {
>       /*
>        * Don't use iov_iter_advance() here, as it's really slow for
>        * using the latter parts of a big fixed buffer - it iterates
>        * over each segment manually. We can cheat a bit here, because
>        * we know that:
>
>So if at all I could move the code inside blk_rq_map_user_iov, I will
>need to see that I skip doing iov_iter_advance.
>
>I still think it would be better to take this route only when there are
>other usecases/callers of this. And that is a future thing. For the current
>requirement, it seems better to prioritze efficency.
>
>>, but that is something that could and should
>>be fixed.
>>
>>>And it really felt cleaner to me write a new function rather than
>>>overloading the blk_rq_map_user_iov with multiple if/else canals.
>>
>>No.  The whole point of the iov_iter is to support this "overload".
>
>Even if I try taking that route, WARN_ON is a blocker that  prevents 
>me to put this code inside blk_rq_map_user_iov.
>
>>>But iov_iter_gap_alignment does not work on bvec iters. Line #1274 below
>>
>>So we'll need to fix it.
>
>Do you see good way to trigger this virt-alignment condition? I have
>not seen this hitting (the SG gap checks) when running with fixebufs.
>
>>>1264 unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
>>>1265 {
>>>1266         unsigned long res = 0;
>>>1267         unsigned long v = 0;
>>>1268         size_t size = i->count;
>>>1269         unsigned k;
>>>1270
>>>1271         if (iter_is_ubuf(i))
>>>1272                 return 0;
>>>1273
>>>1274         if (WARN_ON(!iter_is_iovec(i)))
>>>1275                 return ~0U;
>>>
>>>Do you see a way to overcome this. Or maybe this can be revisted as we
>>>are not missing a lot?
>>
>>We just need to implement the equivalent functionality for bvecs.  It
>>isn't really hard, it just wasn't required so far.
>
>Can the virt-boundary alignment gap exist for ITER_BVEC iter in first
>place? Two reasons to ask this question:
>
>1. Commit description of this code (from Al viro) says -
>
>"iov_iter_gap_alignment(): get rid of iterate_all_kinds()
>
>For one thing, it's only used for iovec (and makes sense only for
>those)."
>
>2. I did not hit it so far as I mentioned above.

And we also have below condition (patch of Linus) that restricts
blk_rq_map_user_iov to only iovec iterator

commit a0ac402cfcdc904f9772e1762b3fda112dcc56a0
Author: Linus Torvalds <torvalds at linux-foundation.org>
Date:   Tue Dec 6 16:18:14 2016 -0800

    Don't feed anything but regular iovec's to blk_rq_map_user_iov

    In theory we could map other things, but there's a reason that function
    is called "user_iov".  Using anything else (like splice can do) just
    confuses it.

    Reported-and-tested-by: Johannes Thumshirn <jthumshirn at suse.de>
    Cc: Al Viro <viro at ZenIV.linux.org.uk>
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>

diff --git a/block/blk-map.c b/block/blk-map.c
index b8657fa8dc9a..27fd8d92892d 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -118,6 +118,9 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
        struct iov_iter i;
        int ret;

+       if (!iter_is_iovec(iter))
+               goto fail;
+
        if (map_data)
                copy = true;
        else if (iov_iter_alignment(iter) & align)
@@ -140,6 +143,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,

 unmap_rq:
        __blk_rq_unmap_user(bio);
+fail:
        rq->bio = NULL;
        return -EINVAL;
 }



More information about the Linux-nvme mailing list