4.5-rc iser issues

Ming Lei ming.lei at canonical.com
Sun Feb 14 08:20:58 PST 2016


Hi Sagi,

On Sun, Feb 14, 2016 at 11:22 PM, Christoph Hellwig <hch at infradead.org> wrote:
> Adding Ming to Cc.
>
> But I don't think simply not cloning the biovecs is the right thing
> to do in the end.  This must be something with the bvec iterators.

I agree with Christoph, and there might be issues somewhere.

> From the log:
> iser: sg[0] dma_addr:0x85FC06000 off:0x0 sz:0x200 dma_len:0x200
> iser: sg[1] dma_addr:0x860334000 off:0x0 sz:0x200 dma_len:0x200 <-- gap
> iser: sg[2] dma_addr:0x860335000 off:0x0 sz:0x200 dma_len:0x200 <-- gap

The above gap shouldn't have come since blk_bio_segment_split() splits
out one new bio if gap is detected.

Sort of the following code can be added in driver or prep_fn to check if
bvec of  the rq is correct:

rq_for_each_segment(bvec, sc->request, iter) {
     //check if there is gap between bvec
}

I don't know how to use iser, and looks everything works fine after
I setup virt boundary as 4095 for null_blk by the attachment
patch.

> Full quote for Ming:
>
> On Sun, Feb 14, 2016 at 04:02:18PM +0200, Sagi Grimberg wrote:
>>
>> >>I'm bisecting now, there are a couple of patches from Ming in
>> >>the area of the bio splitting code...
>> >>
>> >>CC'ing Ming, Linux-block and Linux-nvme as iser is identical to nvme
>> >>wrt the virtual boundary so I think nvme will break as well.

The bisected commit is merged to v4.3, and looks no such kind of
report from nvme.

>> >
>> >Bisection reveals that this one is the culprit:
>> >
>> >commit 52cc6eead9095e2faf2ec7afc013aa3af1f01ac5
>> >Author: Ming Lei <ming.lei at canonical.com>
>> >Date:   Thu Sep 17 09:58:38 2015 -0600
>> >
>> >     block: blk-merge: fast-clone bio when splitting rw bios
>> >
>> >     biovecs has become immutable since v3.13, so it isn't necessary
>> >     to allocate biovecs for the new cloned bios, then we can save
>> >     one extra biovecs allocation/copy, and the allocation is often
>> >     not fixed-length and a bit more expensive.
>> >
>> >     For example, if the 'max_sectors_kb' of null blk's queue is set
>> >     as 16(32 sectors) via sysfs just for making more splits, this patch
>> >     can increase throught about ~70% in the sequential read test over
>> >     null_blk(direct io, bs: 1M).
>> >
>> >     Cc: Christoph Hellwig <hch at infradead.org>
>> >     Cc: Kent Overstreet <kent.overstreet at gmail.com>
>> >     Cc: Ming Lin <ming.l at ssi.samsung.com>
>> >     Cc: Dongsu Park <dpark at posteo.net>
>> >     Signed-off-by: Ming Lei <ming.lei at canonical.com>
>> >
>> >     This fixes a performance regression introduced by commit 54efd50bfd,
>> >     and allows us to take full advantage of the fact that we have
>> >immutable
>> >     bio_vecs. Hand applied, as it rejected violently with commit
>> >     5014c311baa2.
>> >
>> >     Signed-off-by: Jens Axboe <axboe at fb.com>
>> >--
>>
>> Looks like there is a problem with bio_clone_fast()
>>
>> This change makes the problem go away:
>> --
>> diff --git a/block/bio.c b/block/bio.c
>> index dbabd48..5e93733 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -1791,7 +1791,7 @@ struct bio *bio_split(struct bio *bio, int sectors,
>>          * Discards need a mutable bio_vec to accommodate the payload
>>          * required by the DSM TRIM and UNMAP commands.
>>          */
>> -       if (bio->bi_rw & REQ_DISCARD)
>> +       if (1 || bio->bi_rw & REQ_DISCARD)
>>                 split = bio_clone_bioset(bio, gfp, bs);
>>         else
>>                 split = bio_clone_fast(bio, gfp, bs);
>> --
>>
>> Any thoughts?

I don't think bio_clone_fast() is wrong, which use bvec table from
the original bio, and drivers are not allowed to change the table, and
should just call the standard iterator helpers to access bvec.

Also bio_for_each_segment_all() can't be used to iterate over one
cloned bio.

Thanks,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: null_blk_sg_virt_boundary.patch
Type: text/x-patch
Size: 1621 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20160215/26a7157b/attachment.bin>


More information about the Linux-nvme mailing list