[PATCH rfc] nvme: support io stats on the mpath device

Sagi Grimberg sagi at grimberg.me
Thu Sep 29 02:59:46 PDT 2022


> Hi Sagi,
> 
> On 9/28/2022 10:55 PM, Sagi Grimberg wrote:
>> Our mpath stack device is just a shim that selects a bottom namespace
>> and submits the bio to it without any fancy splitting. This also means
>> that we don't clone the bio or have any context to the bio beyond
>> submission. However it really sucks that we don't see the mpath device
>> io stats.
>>
>> Given that the mpath device can't do that without adding some context
>> to it, we let the bottom device do it on its behalf (somewhat similar
>> to the approach taken in nvme_trace_bio_complete);
> 
> Can you please paste the output of the application that shows the 
> benefit of this commit ?

What do you mean? there is no noticeable effect on the application here.
With this patch applied, /sys/block/nvmeXnY/stat is not zeroed out,
sysstat and friends can monitor IO stats, as well as other observability
tools.

> 
>>
>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>> ---
>>   drivers/nvme/host/apple.c     |  2 +-
>>   drivers/nvme/host/core.c      | 10 ++++++++++
>>   drivers/nvme/host/fc.c        |  2 +-
>>   drivers/nvme/host/multipath.c | 18 ++++++++++++++++++
>>   drivers/nvme/host/nvme.h      | 12 ++++++++++++
>>   drivers/nvme/host/pci.c       |  2 +-
>>   drivers/nvme/host/rdma.c      |  2 +-
>>   drivers/nvme/host/tcp.c       |  2 +-
>>   drivers/nvme/target/loop.c    |  2 +-
>>   9 files changed, 46 insertions(+), 6 deletions(-)
> 
> Several questions:
> 
> 1. I guess that for the non-mpath case we get this for free from the 
> block layer for each bio ?

blk-mq provides all IO stat accounting, hence it is on by default.

> 2. Now we have doubled the accounting, haven't we ?

Yes. But as I listed in the cover-letter, I've been getting complaints
about how IO stats appear only for the hidden devices (blk-mq devices)
and there is an non-trivial logic to map that back to the mpath device,
which can also depend on the path selection logic...

I think that this is very much justified, the observability experience
sucks. IMO we should have done it since introducing nvme-multipath.

> 3. Do you have some performance numbers (we're touching the fast path 
> here) ?

This is pretty light-weight, accounting is per-cpu and only wrapped by
preemption disable. This is a very small price to pay for what we gain.

I don't have any performance numbers, other than on my laptop VM that
did not record any noticeable difference, which I don't expect to have.

> 4. Should we enable this by default ?

Yes. there is no reason why nvme-mpath should be the only block device
that does not account and expose IO stats.



More information about the Linux-nvme mailing list