[PATCHv3 1/2] block: accumulate segment page gaps per bio
Christoph Hellwig
hch at infradead.org
Mon Aug 25 06:46:50 PDT 2025
On Thu, Aug 21, 2025 at 01:44:19PM -0700, Keith Busch wrote:
> +static inline unsigned int bvec_seg_gap(struct bio_vec *bv, struct bio_vec *bvprv)
Nit: overly long line.
> +{
> + return ((bvprv->bv_offset + bvprv->bv_len) & (PAGE_SIZE - 1)) |
> + bv->bv_offset;
But what's actually more important is a good name, and a good comment.
Without much of an explanation this just looks like black magic :)
Also use the chance to document why all this is PAGE_SIZE based and
not based on either the iommu granule size or the virt boundary.
> + if (bvprvp) {
> + if (bvec_gap_to_prev(lim, bvprvp, bv.bv_offset))
> + goto split;
> + page_gaps |= bvec_seg_gap(&bv, &bvprv);
> + }
>
> if (nsegs < lim->max_segments &&
> bytes + bv.bv_len <= max_bytes &&
> @@ -326,6 +335,7 @@ int bio_split_io_at(struct bio *bio, const struct queue_limits *lim,
> }
>
> *segs = nsegs;
> + bio->bi_pg_bit = ffs(page_gaps);
Caling this "bit" feels odd. I guess the idea is that you only care
about power of two alignments? I think this would be much easier
with the whole theory of operation spelled out somewhere in detail,
including why the compression to the set bits works, why the PAGE
granularity matters, why we only need to set this bit when splitting
but not on bios that never gets split or at least looped over for
splitting decisions.
> enum rw_hint bi_write_hint;
> u8 bi_write_stream;
> blk_status_t bi_status;
> +
> + /*
> + * The page gap bit indicates the lowest set bit in any page address
> + * offset between all bi_io_vecs. This field is initialized only after
> + * splitting to the hardware limits.
> + */
> + u8 bi_pg_bit;
Maybe move this one up so that all the field only set on the submission
side stay together?
More information about the Linux-nvme
mailing list