[PATCH v6 1/1] nvme-multipath: implement "queue-depth" iopolicy

John Meneghini jmeneghi at redhat.com
Wed Jun 19 08:44:02 PDT 2024


On 6/11/24 21:44, Chaitanya Kulkarni wrote:
> On 6/11/24 17:20, John Meneghini wrote:
>> From: Thomas Song <tsong at purestorage.com>
>>

>> +
>> +	if ((nvme_req(rq)->flags & NVME_MPATH_CNT_ACTIVE)) {
>> +		result = atomic_dec_if_positive(&ns->ctrl->nr_active);
>> +		WARN_ON_ONCE(result < 0);
>> +	}
>>    
>>    	if (!(nvme_req(rq)->flags & NVME_MPATH_IO_STATS))
>>    		return;
> 
> can we remove result variable ? that is only used once,
> how about something like this unless there is something wrong with
> totally untested :-

Sure I can do that.

>> +static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head)
>>    {
>> -	struct nvme_ns *ns, *found = NULL;
>> +	struct nvme_ns *ns, *old, *found = NULL;
>> +	int node = numa_node_id();
>> +
>> +	old = srcu_dereference(head->current_path[node], &head->srcu);
>> +
> 
> nit:- no need for white-line above ?

I sometimes add a line feed because I think it makes the code more readable, But everyone seems to dislike extra white lines so 
I'll remove them.

>> +inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
>> +{
>> +	switch (READ_ONCE(head->subsys->iopolicy)) {
>> +	case NVME_IOPOLICY_QD:
>> +		return nvme_queue_depth_path(head);
>> +	case NVME_IOPOLICY_RR:
>> +		return nvme_round_robin_path(head);
>> +	default:
>> +		return nvme_numa_path(head);
>> +	}
> 
> should we use another case for NVME_IOPOLICY_NUMA that will call
> nvme_numa_path() and report ratelimited error on the default lable
> before settling on nvme_numa_path()?
> 
> something like this totally untested :-

Actually, I don't think this is worth it. The likelihood that the iopolicy will get corrupted is almost NILL. The only way this 
can happen is if there were a bug in the sysfs code that controls this variable. I've tested this enough to know there's not 
going to be any problem here and I don't think adding a warning to a code path that can only be hit by a programming error is 
needed.

>> +}
>> +
>>    static bool nvme_available_path(struct nvme_ns_head *head)
>>    {
>>    	struct nvme_ns *ns;
>> @@ -803,6 +870,28 @@ static ssize_t nvme_subsys_iopolicy_show(struct device *dev,
>>    			  nvme_iopolicy_names[READ_ONCE(subsys->iopolicy)]);
>>    }
>>    
>> +static void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys,
>> +		int iopolicy)
>> +{
>> +	struct nvme_ctrl *ctrl;
>> +	int old_iopolicy = READ_ONCE(subsys->iopolicy);
>> +
>> +	if (old_iopolicy == iopolicy)
>> +		return;
>> +
>> +	WRITE_ONCE(subsys->iopolicy, iopolicy);
>> +
>> +	/* iopolicy changes clear the mpath by design */
>> +	mutex_lock(&nvme_subsystems_lock);
>> +	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
>> +		nvme_mpath_clear_ctrl_paths(ctrl);
>> +	mutex_unlock(&nvme_subsystems_lock);
>> +
>> +	pr_notice("%s: changed from %s to %s for subsysnqn %s\n", __func__,
>> +			nvme_iopolicy_names[old_iopolicy], nvme_iopolicy_names[iopolicy],
>> +			subsys->subnqn);
>> +}
>> +
>>    static ssize_t nvme_subsys_iopolicy_store(struct device *dev,
>>    		struct device_attribute *attr, const char *buf, size_t count)
>>    {
>> @@ -812,7 +901,7 @@ static ssize_t nvme_subsys_iopolicy_store(struct device *dev,
>>    
>>    	for (i = 0; i < ARRAY_SIZE(nvme_iopolicy_names); i++) {
>>    		if (sysfs_streq(buf, nvme_iopolicy_names[i])) {
>> -			WRITE_ONCE(subsys->iopolicy, i);
>> +			nvme_subsys_iopolicy_update(subsys, i);
>>    			return count;
>>    		}
>>    	}
>> @@ -923,6 +1012,9 @@ int nvme_mpath_init_identify(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
>>    	    !(ctrl->subsys->cmic & NVME_CTRL_CMIC_ANA))
>>    		return 0;
>>    
>> +	/* initialize this in the identify path to cover controller resets */
> 
> nit: If I'm not wrong, this function gets called from
> |nvme_init_identify()|,
> so it's pretty clear. That makes above comment kind of redundant ?
> However, if others want that comment here, please ignore this message.

Yes, but it's not clear that nvme_init_identify() is called in the controller reset path.

Hannes asked for a comment here so I'd like to keep this.

>> +	atomic_set(&ctrl->nr_active, 0);
>> +
>>    	if (!ctrl->max_namespaces ||
>>    	    ctrl->max_namespaces > le32_to_cpu(id->nn)) {
>>    		dev_err(ctrl->device,
>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>> index 73442d3f504b..d6c1fe3e2832 100644
>> --- a/drivers/nvme/host/nvme.h
>> +++ b/drivers/nvme/host/nvme.h
>> @@ -50,6 +50,8 @@ extern struct workqueue_struct *nvme_wq;
>>    extern struct workqueue_struct *nvme_reset_wq;that 
>>    extern struct workqueue_struct *nvme_delete_wq;
>>    
>> +extern struct mutex nvme_subsystems_lock;
>> +
>>    /*
>>     * List of workarounds for devices that required behavior not specified in
>>     * the standard.
>> @@ -195,6 +197,7 @@ enum {
>>    	NVME_REQ_CANCELLED		= (1 << 0),
>>    	NVME_REQ_USERCMD		= (1 << 1),
>>    	NVME_MPATH_IO_STATS		= (1 << 2),
>> +	NVME_MPATH_CNT_ACTIVE	= (1 << 3),
> 
> nit:- please align above to existing code ...
> 

I changed my tab stop from 4 to 8 and fixed this.

Thanks for your review. I will follow up with a v7 patch.

/John




More information about the Linux-nvme mailing list