[PATCH 1/3] block: introduce blk_queue_nr_active()

Wed Oct 4 02:19:11 PDT 2023

> I don't know if this is due to the "using xarray on 'small' arrays is
> horrible for performance" that Hannes mentioned. Maybe reverting that
> patch would help things, but I still prefer the atomics approach for its
> simplicity and the fact that the data does not indicate that the two new
> RMW ops per I/O are a source of issues. If the contention is still
> considered a problem, we can "split" the atomic into pieces along:
> 
> - namespace boundaries, so that the atomic lives in struct nvme_ns
>    instead of struct nvme_ctrl. if different CPUs are doing I/O to
>    different namespaces (which may be a common access pattern), this will
>    reduce the contention on the atomic. this would give us pretty much a
>    1:1 translation of the queue-length path selector from dm.
> - hctx boundaries - when we calculate the queue depth, instead of
>    summing the depths for every path's hctxs, only consider the hctx
>    associated to the local CPU.

I think that this is the best approach. It is matching the existing
synchronization boundary in the io-path today.