extend bi_size to unsigned long ?
Ming Lei
tom.leiming at gmail.com
Tue Dec 13 17:16:37 PST 2016
Hi Coly,
On Tue, Dec 13, 2016 at 10:53 PM, Coly Li <i at coly.li> wrote:
> Hi linux-block and linux-nvme lists,
>
> Recently when I work on md-raid0 DISCARD optimization, I found the
> maximum DISCARD bio length that raid0_make_request() receives is 8388608
> sectors. I find it is because of the limitation of bi_size, which is
> unsigned int and 32 bits length.
>
> A 32 bits bi_size means a DISCARD bio can only cover UINT_MAX>>9
> sectors, see commit a22c4d7e3440 ("block: re-add discard_granularity and
> alignment checks"). To format a xfs volume on 4x4TB NVMe SSDs, the
> original DISCARD bio has to be split for 4x1024 times. If bi_size is a
> 64 bits unsigned long, in ideal condition the original DISCARD bio can
> only be split for 4 times, that is one split bio for each device.
I guess it still need 4*2 times even bi_size becomes 64bits because
limit.max_discard_sectors is 32bit.
On the other hand, it depends on the actual max discard sectors limit
from the hardware, now NVMe just sets it as UINT_MAX(2TB).
>
> Now days it won't be a big issue since block layer may merge the split
> bios (or may not if its block-mq and NVMe). When the underlying device
In your case, block can't merge, because request.__data_len is still 32bit,
and the worse thing is that looks block does not consider overflow yet when
dealing with merge.
> becomes larger and larger, maybe a 32 bits bi_size will hurt DISCARD
> performance.
I guess this change only makes sense for DISCARD/WRITE_SAME,
and in blkdev_issue_discard(), the splitted bios(each one can be 4GB)
are always sent to device concurrently, so did you obseve obvious
performance loss on your 4TB NVMe(leave raid alone first) when doing
mkfs?
>
> I know this is not simple, it changes a very important KABI. But this is
> really an interesting question to ask: do we have any idea to extend
> bi_size from unsigned int to unsigned long ?
The concern should be bio size's increasemnt, but looks sizeof(struct bvec_iter)
won't change after .bi_size becomes 64bit if we won't define 'bvec_iter' as
compact.
But I remembered that sizeof(bio) can be decreased to 128 bytes if
'struct bvec_iter' is defined as compact and 'bi_phys_segments' is changed
to 'unsigned short'.
Thanks,
Ming
>
> Thanks in advance.
>
> --
> Coly Li
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Ming Lei
More information about the Linux-nvme
mailing list