[PATCH 12/15] block: switch polling to be bio based

Keith Busch kbusch at kernel.org
Fri May 14 09:26:12 PDT 2021


On Thu, May 13, 2021 at 07:50:26PM -0700, Keith Busch wrote:
> On Wed, May 12, 2021 at 03:12:37PM -0700, Keith Busch wrote:
> > On Wed, May 12, 2021 at 03:03:40PM -0700, Sagi Grimberg wrote:
> > > On 5/12/21 6:15 AM, Christoph Hellwig wrote:
> > > > + *
> > > > + * Note: the caller must either be the context that submitted @bio, or
> > > > + * be in a RCU critical section to prevent freeing of @bio.
> > > > + */
> > > > +int bio_poll(struct bio *bio, unsigned int flags)
> > > > +{
> > > > +	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> > > > +	blk_qc_t cookie = READ_ONCE(bio->bi_cookie);
> > > > +
> > > > +	if (cookie == BLK_QC_T_NONE ||
> > > > +	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
> > > > +		return 0;
> > > > +
> > > > +	if (current->plug)
> > > > +		blk_flush_plug_list(current->plug, false);
> > > > +
> > > > +	/* not yet implemented, so this should not happen */
> > > > +	if (WARN_ON_ONCE(!queue_is_mq(q)))
> > > 
> > > What happens if the I/O wasn't (yet) queued to the bottom device
> > > (i.e. no available path in nvme-mpath)?
> > > 
> > > In this case the disk would be the mpath device node (which is
> > > not mq...)
> > 
> > The bi_cookie should remain BLK_QC_T_NONE in that case, so we wouldn't
> > get to the warning. But if that does happen, it doesn't appear that
> > anyone is going to wake up thread that needs to poll for this bio's
> > completion when a path becomes available for dispatch. I think it would
> > make sense for nvme-mpath to just clear the REQ_POLLED flag if it
> > doesn't immediately have viable path.
> 
> It looks like there is a way to hit this warning. If you induce a path
> error to create a failover, the nvme-mpath will reset the bio's bdev to
> the mpath one, which is not MQ. I can occaisionally hit it with a fault
> injection test.

Oh, and it gets a bit worse. Using pvsync2 will block forever once this
warning is hit, or if nvme-mpath had to requeue the IO for lack of a
path. io_uring doesn't have that hang problem though.

Christoph, I think patch 15 needs something like this:

---
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 7febdb57f690..d40a9331daf7 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -419,6 +419,11 @@ static void nvme_requeue_work(struct work_struct *work)
 		 * path.
 		 */
 		bio_set_dev(bio, head->disk->part0);
+
+		if (bio->bi_opf & REQ_POLLED) {
+			bio->bi_opf &= ~REQ_POLLED;
+			bio->bi_cookie = BLK_QC_T_NONE;
+		}
 		submit_bio_noacct(bio);
 	}
 }
--

This should fix the hang since requeued bio's will use an interrupt
driven queue, but it doesn't fix the warning. The recent commit
"nvme-multipath: reset bdev to ns head when failover" looks like it
makes preventing the polling thread from using the non-MQ head disk
not possible.



More information about the Linux-nvme mailing list