[PATCHv6 11/11] iomap: add support for dma aligned direct-io

Eric Farman farman at linux.ibm.com
Thu Jun 23 13:32:55 PDT 2022


On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote:
> On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote:
> > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote:
> > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> > > > From: Keith Busch <kbusch at kernel.org>
> > > > 
> > > > Use the address alignment requirements from the block_device
> > > > for
> > > > direct
> > > > io instead of requiring addresses be aligned to the block size.
> > > 
> > > Hi Keith,
> > > 
> > > Our s390 PV guests recently started failing to boot from a -next
> > > host,
> > > and git blame brought me here.
> > > 
> > > As near as I have been able to tell, we start tripping up on this
> > > code
> > > from patch 9 [1] that gets invoked with this patch:
> > > 
> > > > 	for (k = 0; k < i->nr_segs; k++, skip = 0) {
> > > > 		size_t len = i->iov[k].iov_len - skip;
> > > > 
> > > > 		if (len > size)
> > > > 			len = size;
> > > > 		if (len & len_mask)
> > > > 			return false;
> > > 
> > > The iovec we're failing on has two segments, one with a len of
> > > x200
> > > (and base of x...000) and another with a len of xe00 (and a base
> > > of
> > > x...200), while len_mask is of course xfff.
> > > 
> > > So before I go any further on what we might have broken, do you
> > > happen
> > > to have any suggestions what might be going on here, or something
> > > I
> > > should try?
> > 
> > Thanks for the notice, sorry for the trouble. This check wasn't
> > intended to
> > have any difference from the previous code with respect to the
> > vector lengths.
> > 
> > Could you tell me if you're accessing this through the block device
> > direct-io,
> > or through iomap filesystem?

Reasonably certain the failure's on iomap. I'd reverted the subject
patch from next-20220622 and got things in working order.

> 
> If using iomap, the previous check was this:
> 
> 	unsigned int blkbits =
> blksize_bits(bdev_logical_block_size(iomap->bdev));
> 	unsigned int align = iov_iter_alignment(dio->submit.iter);
> 	...
> 	if ((pos | length | align) & ((1 << blkbits) - 1))
> 		return -EINVAL;
> 
> 
...
> The result of "iov_iter_alignment()" would include "0xe00 | 0x200" in
> your
> example, and checked against 0xfff should have been failing prior to
> this
> patch. Unless I'm missing something...

Nope, you're not. I didn't look back at what the old check was doing,
just saw "0xe00 and 0x200" and thought "oh there's one page" instead of
noting the code was or'ing them. My bad.

That was the last entry in my trace before the guest gave up, as
everything else through this code up to that point seemed okay. I'll
pick up the working case and see if I can get a clearer picture between
the two.





More information about the Linux-nvme mailing list