dm-multipath low performance with blk-mq

Mike Snitzer snitzer at redhat.com
Tue Jan 26 08:03:24 PST 2016


On Mon, Jan 25 2016 at  4:40pm -0500,
Mike Snitzer <snitzer at redhat.com> wrote:

> On Tue, Jan 19 2016 at  5:45P -0500,
> Mike Snitzer <snitzer at redhat.com> wrote:
> 
> > On Mon, Jan 18 2016 at  7:04am -0500,
> > Sagi Grimberg <sagig at dev.mellanox.co.il> wrote:
> > 
> > > Hi All,
> > > 
> > > I've recently tried out dm-multipath over a "super-fast" nvme device
> > > and noticed a serious lock contention in dm-multipath that requires some
> > > extra attention. The nvme device is a simple loopback device emulation
> > > backed by null_blk device.
> > > 
> > > With this I've seen dm-multipath pushing around ~470K IOPs while
> > > the native (loopback) nvme performance can easily push up to 1500K+ IOPs.
> > > 
> > > perf output [1] reveals a huge lock contention on the multipath lock
> > > which is a per-dm_target contention point which seem to defeat the
> > > purpose of blk-mq i/O path.
> > > 
> > > The two current bottlenecks seem to come from multipath_busy and
> > > __multipath_map. Would it make better sense to move to a percpu_ref
> > > model with freeze/unfreeze logic for updates similar to what blk-mq
> > > is doing?
> > >
> > > Thoughts?
> > 
> > Your perf output clearly does identify the 'struct multipath' spinlock
> > as a bottleneck.
> > 
> > Is it fair to assume that implied in your test is that you increased
> > md->tag_set.nr_hw_queues to > 1 in dm_init_request_based_blk_mq_queue()?
> > 
> > I'd like to start by replicating your testbed.  So I'll see about
> > setting up the nvme loop driver you referenced in earlier mail.
> > Can you share your fio job file and fio commandline for your test?
> 
> Would still appreciate answers to my 2 questions above (did you modify
> md->tag_set.nr_hw_queues and can you share your fio job?)
> 
> I've yet to reproduce your config (using hch's nvme loop driver) or

Christoph, any chance you could rebase your 'nvme-loop.2' on v4.5-rc1?

Or point me to a branch that is more current...



More information about the Linux-nvme mailing list