[PATCH v7 1/1] nvme-multipath: implement "queue-depth" iopolicy
John Meneghini
jmeneghi at redhat.com
Thu Jun 20 10:54:29 PDT 2024
On 6/20/24 02:56, Christoph Hellwig wrote:
>> [jmeneghi: vairious changes and improvements, addressed review comments]
>> -static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>> - int node, struct nvme_ns *old)
>> +static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head)
>> {
>> - struct nvme_ns *ns, *found = NULL;
>> + struct nvme_ns *ns, *old, *found = NULL;
>> + int node = numa_node_id();
>> +
>> + old = srcu_dereference(head->current_path[node], &head->srcu);
>> + if (unlikely(!old))
>> + return __nvme_find_path(head, node);
>
> Can you split the refactoring of the existing path selectors into a
> prep patch, please?
Yes, I can do that.
>> +static void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys,
>> + int iopolicy)
>> +{
>> + struct nvme_ctrl *ctrl;
>> + int old_iopolicy = READ_ONCE(subsys->iopolicy);
>> +
>> + if (old_iopolicy == iopolicy)
>> + return;
>> +
>> + WRITE_ONCE(subsys->iopolicy, iopolicy);
>
> What is the atomicy model here? There doesn't seem to be any
> global lock protecting it? Maybe move it into the
> nvme_subsystems_lock critical section?
Good question. I didn't write this code. Yes, I agree this looks racy. Updates to the subsys->iopolicy variable are not atomic.
They don't need to be. The process of changing the iopolicy doesn't need to be synchronized and each CPU's cache will be updated
lazily. This was done to avoid the expense of adding (another) atomic read the io path.
So really, the nvme_subsystems_lock only protects the list_entrys in &nvme_subsystems. I can move the WRITE_ONCE() update to
after the lock, but then were did the caller - nvme_subsys_iopolicy_update() - get it's reference for the nvme_subsystem* from?
And there's no lock or synchronization on the read side.
The subsys->iopolicy has always been unprotected. It's been that way from since before this change.
At least... that's my read of the code.
>> + pr_notice("%s: changed from %s to %s for subsysnqn %s\n", __func__,
>> + nvme_iopolicy_names[old_iopolicy], nvme_iopolicy_names[iopolicy],
>> + subsys->subnqn);
>
> The function is not really relevant here, this should become something
> like:
>
> pr_notice("%s: changing iopolicy from %s to %s\n",
> subsys->subnqn,
> nvme_iopolicy_names[old_iopolicy],
> nvme_iopolicy_names[iopolicy]);
How about:
pr_notice("Changed iopolicy from %s to %s for subsysnqn %s\n",
nvme_iopolicy_names[old_iopolicy],
nvme_iopolicy_names[iopolicy],
subsys->subnqn);
> or maybe:
>
> dev_notice(changing iopolicy from %s to %s\n",
> &subsys->dev,
> nvme_iopolicy_names[old_iopolicy],
> nvme_iopolicy_names[iopolicy]);
>
The dev_notice will only spit out the nvme controller number (e.g. nvme5c5n3) and that's not very helpful to the user. I want to
include the subsystemNQN because that's easily visible and something the user can see on both the storage and the host.
Example:
root:resultsF > echo "round-robin" > /sys/class/nvme-subsystem/nvme-subsys11/iopolicy
[Thu Jun 20 13:42:59 2024] Changed iopolicy from queue-depth to round-robin for subsysnqn
nqn.1992-08.com.netapp:sn.2b82d9b13bb211ee8744d039ea989119:subsystem.SS104a
root:resultsF > nvme list-subsys | grep -A 8 nqn.1992-08.com.netapp:sn.2b82d9b13bb211ee8744d039ea989119:subsystem.SS104a
nvme-subsys11 - NQN=nqn.1992-08.com.netapp:sn.2b82d9b13bb211ee8744d039ea989119:subsystem.SS104a
hostnqn=nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0034-5310-8057-b2c04f355333
iopolicy=round-robin
\
+- nvme8 tcp traddr=172.18.60.14,trsvcid=4420,src_addr=172.18.60.4 live
+- nvme17 tcp traddr=172.18.50.15,trsvcid=4420,src_addr=172.18.50.3 live
+- nvme12 tcp traddr=172.18.60.16,trsvcid=4420,src_addr=172.18.60.4 live
+- nvme10 tcp traddr=172.18.50.13,trsvcid=4420,src_addr=172.18.50.3 live
/John
More information about the Linux-nvme
mailing list