[PATCH rfc] nvme: support io stats on the mpath device
Sagi Grimberg
sagi at grimberg.me
Thu Sep 29 02:59:46 PDT 2022
> Hi Sagi,
>
> On 9/28/2022 10:55 PM, Sagi Grimberg wrote:
>> Our mpath stack device is just a shim that selects a bottom namespace
>> and submits the bio to it without any fancy splitting. This also means
>> that we don't clone the bio or have any context to the bio beyond
>> submission. However it really sucks that we don't see the mpath device
>> io stats.
>>
>> Given that the mpath device can't do that without adding some context
>> to it, we let the bottom device do it on its behalf (somewhat similar
>> to the approach taken in nvme_trace_bio_complete);
>
> Can you please paste the output of the application that shows the
> benefit of this commit ?
What do you mean? there is no noticeable effect on the application here.
With this patch applied, /sys/block/nvmeXnY/stat is not zeroed out,
sysstat and friends can monitor IO stats, as well as other observability
tools.
>
>>
>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>> ---
>> drivers/nvme/host/apple.c | 2 +-
>> drivers/nvme/host/core.c | 10 ++++++++++
>> drivers/nvme/host/fc.c | 2 +-
>> drivers/nvme/host/multipath.c | 18 ++++++++++++++++++
>> drivers/nvme/host/nvme.h | 12 ++++++++++++
>> drivers/nvme/host/pci.c | 2 +-
>> drivers/nvme/host/rdma.c | 2 +-
>> drivers/nvme/host/tcp.c | 2 +-
>> drivers/nvme/target/loop.c | 2 +-
>> 9 files changed, 46 insertions(+), 6 deletions(-)
>
> Several questions:
>
> 1. I guess that for the non-mpath case we get this for free from the
> block layer for each bio ?
blk-mq provides all IO stat accounting, hence it is on by default.
> 2. Now we have doubled the accounting, haven't we ?
Yes. But as I listed in the cover-letter, I've been getting complaints
about how IO stats appear only for the hidden devices (blk-mq devices)
and there is an non-trivial logic to map that back to the mpath device,
which can also depend on the path selection logic...
I think that this is very much justified, the observability experience
sucks. IMO we should have done it since introducing nvme-multipath.
> 3. Do you have some performance numbers (we're touching the fast path
> here) ?
This is pretty light-weight, accounting is per-cpu and only wrapped by
preemption disable. This is a very small price to pay for what we gain.
I don't have any performance numbers, other than on my laptop VM that
did not record any noticeable difference, which I don't expect to have.
> 4. Should we enable this by default ?
Yes. there is no reason why nvme-mpath should be the only block device
that does not account and expose IO stats.
More information about the Linux-nvme
mailing list