[PATCHv6 11/11] iomap: add support for dma aligned direct-io

Eric Farman farman at linux.ibm.com
Mon Jun 27 08:21:20 PDT 2022


On Thu, 2022-06-23 at 17:34 -0400, Eric Farman wrote:
> On Thu, 2022-06-23 at 16:32 -0400, Eric Farman wrote:
> > On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote:
> > > On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote:
> > > > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote:
> > > > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> > > > > > From: Keith Busch <kbusch at kernel.org>
> > > > > > 
> > > > > > Use the address alignment requirements from the
> > > > > > block_device
> > > > > > for
> > > > > > direct
> > > > > > io instead of requiring addresses be aligned to the block
> > > > > > size.
> > > > > 
> > > > > Hi Keith,
> > > > > 
> > > > > Our s390 PV guests recently started failing to boot from a
> > > > > -next
> > > > > host,
> > > > > and git blame brought me here.
> > > > > 
> > > > > As near as I have been able to tell, we start tripping up on
> > > > > this
> > > > > code
> > > > > from patch 9 [1] that gets invoked with this patch:
> > > > > 
> > > > > > 	for (k = 0; k < i->nr_segs; k++, skip = 0) {
> > > > > > 		size_t len = i->iov[k].iov_len - skip;
> > > > > > 
> > > > > > 		if (len > size)
> > > > > > 			len = size;
> > > > > > 		if (len & len_mask)
> > > > > > 			return false;
> > > > > 
> > > > > The iovec we're failing on has two segments, one with a len
> > > > > of
> > > > > x200
> > > > > (and base of x...000) and another with a len of xe00 (and a
> > > > > base
> > > > > of
> > > > > x...200), while len_mask is of course xfff.
> > > > > 
> > > > > So before I go any further on what we might have broken, do
> > > > > you
> > > > > happen
> > > > > to have any suggestions what might be going on here, or
> > > > > something
> > > > > I
> > > > > should try?
> > > > 
> > > > Thanks for the notice, sorry for the trouble. This check wasn't
> > > > intended to
> > > > have any difference from the previous code with respect to the
> > > > vector lengths.
> > > > 
> > > > Could you tell me if you're accessing this through the block
> > > > device
> > > > direct-io,
> > > > or through iomap filesystem?
> > 
> > Reasonably certain the failure's on iomap. I'd reverted the subject
> > patch from next-20220622 and got things in working order.
> > 
> > > If using iomap, the previous check was this:
> > > 
> > > 	unsigned int blkbits =
> > > blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > 	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > 	...
> > > 	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > 		return -EINVAL;
> > > 
> > > 
> > ...
> > > The result of "iov_iter_alignment()" would include "0xe00 |
> > > 0x200"
> > > in
> > > your
> > > example, and checked against 0xfff should have been failing prior
> > > to
> > > this
> > > patch. Unless I'm missing something...
> > 
> > Nope, you're not. I didn't look back at what the old check was
> > doing,
> > just saw "0xe00 and 0x200" and thought "oh there's one page"
> > instead
> > of
> > noting the code was or'ing them. My bad.
> > 
> > That was the last entry in my trace before the guest gave up, as
> > everything else through this code up to that point seemed okay.
> > I'll
> > pick up the working case and see if I can get a clearer picture
> > between
> > the two.
> 
> Looking over the trace again, I realize I did dump
> iov_iter_alignment()
> as a comparator, and I see one pass through that had a non-zero
> response but bdev_iter_is_aligned() returned true...
> 
> count = x1000
> iov_offset = x0
> nr_segs = 1
> iov_len = x1000	(len_mask = xfff)
> iov_base = x...200 (addr_mask = x1ff)
> 
> That particular pass through is in the middle of the stuff it tried
> to
> do, so I don't know if that's the cause or not but it strikes me as
> unusual. Will look into that tomorrow and report back.
> 

Apologies, it took me an extra day to get back to this, but it is
indeed this pass through that's causing our boot failures. I note that
the old code (in iomap_dio_bio_iter), did:

        if ((pos | length | align) & ((1 << blkbits) - 1))
                return -EINVAL;

With blkbits equal to 12, the resulting mask was 0x0fff against an
align value (from iov_iter_alignment) of x200 kicks us out.

The new code (in iov_iter_aligned_iovec), meanwhile, compares this:

                if ((unsigned long)(i->iov[k].iov_base + skip) &
addr_mask)
                        return false;

iov_base (and the output of the old iov_iter_aligned_iovec() routine)
is x200, but since addr_mask is x1ff this check provides a different
response than it used to.

To check this, I changed the comparator to len_mask (almost certainly
not the right answer since addr_mask is then unused, but it was good
for a quick test), and our PV guests are able to boot again with -next
running in the host.

Thanks,
Eric




More information about the Linux-nvme mailing list