[PATCHv6 11/11] iomap: add support for dma aligned direct-io

Keith Busch kbusch at kernel.org
Thu Jun 23 12:11:57 PDT 2022


On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote:
> On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote:
> > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> > > From: Keith Busch <kbusch at kernel.org>
> > > 
> > > Use the address alignment requirements from the block_device for
> > > direct
> > > io instead of requiring addresses be aligned to the block size.
> > 
> > Hi Keith,
> > 
> > Our s390 PV guests recently started failing to boot from a -next host,
> > and git blame brought me here.
> > 
> > As near as I have been able to tell, we start tripping up on this code
> > from patch 9 [1] that gets invoked with this patch:
> > 
> > >	for (k = 0; k < i->nr_segs; k++, skip = 0) {
> > >		size_t len = i->iov[k].iov_len - skip;
> > >
> > >		if (len > size)
> > >			len = size;
> > >		if (len & len_mask)
> > >			return false;
> > 
> > The iovec we're failing on has two segments, one with a len of x200
> > (and base of x...000) and another with a len of xe00 (and a base of
> > x...200), while len_mask is of course xfff.
> > 
> > So before I go any further on what we might have broken, do you happen
> > to have any suggestions what might be going on here, or something I
> > should try?
> 
> Thanks for the notice, sorry for the trouble. This check wasn't intended to
> have any difference from the previous code with respect to the vector lengths.
> 
> Could you tell me if you're accessing this through the block device direct-io,
> or through iomap filesystem?

If using iomap, the previous check was this:

	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
	unsigned int align = iov_iter_alignment(dio->submit.iter);
	...
	if ((pos | length | align) & ((1 << blkbits) - 1))
		return -EINVAL;

And if raw block device, it was this:

	if ((pos | iov_iter_alignment(iter)) &
	    (bdev_logical_block_size(bdev) - 1))
		return -EINVAL;

The result of "iov_iter_alignment()" would include "0xe00 | 0x200" in your
example, and checked against 0xfff should have been failing prior to this
patch. Unless I'm missing something...



More information about the Linux-nvme mailing list