NVMe controller reset

David Sariel David.Sariel at pmcs.com
Mon Mar 23 13:42:57 PDT 2015


Hi 
As a matter of fact I was also paying around with reset recently.
And Keith's note about the pre-3.15 lack of safety explained all the strange things I saw on 3.14.16.

I have installed v 3.16.2 and played a little bit more. Particularly, I have added the NVME_IOCTL_RESET to 
See that reset works. What I did is to call the nvme_dev_reset upon the NVME_IOCTL_RESET. 
And resets are working just fine. Even sending multiple resets under traffic only slows the traffic a little bit, but eventually fio 
Manages to run to the completion. 

Then I tried to investigate a little bit what happens with reset entry in sysfs on v 3.16.2 and it will be great if somebody could answer the questions I have:

1) making echo 1 > /sys/class/misc/nvme0/device/reset cause no call to the nvme_dev_reset. And no controller reset was caused in the NVME controller as well. So the question is why resets were implemented via the reset_work? Currently resets are delivered to the controller (turning CC enable to 0 etc) under a several conditions:
readl(&dev->bar->csts) & NVME_CSTS_CFS && dev->initialized and then the work is only scheduled. What is the rational? Can't we just read the /sys/class/misc/nvme0/device/reset  and call to the nvme_dev_reset (of course subject to take the rcu_read_lock/ rcu_read_unlock)

2) Is it ok to add NVME_IOCTL_RESET in case the rational for resets with sysfs/ reset_work combination is not intended to work with multiple resets under traffic? The code I have tested is very simple:

In the driver:

/* 0x43 is used by nvme-cli */
#define NVME_IOCTL_RESET  _IO('N', 0x44)

static int nvme_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd,
							unsigned long arg)
{
	struct nvme_ns *ns = bdev->bd_disk->private_data;


	switch (cmd) {
	case NVME_IOCTL_ID:
		force_successful_syscall_return();
		return ns->ns_id;
	case NVME_IOCTL_ADMIN_CMD:
		return nvme_user_admin_cmd(ns->dev, (void __user *)arg);
	case NVME_IOCTL_SUBMIT_IO:
		return nvme_submit_io(ns, (void __user *)arg);
	case NVME_IOCTL_RESET:
		nvme_dev_reset(ns->dev);
		return 0;
	case SG_GET_VERSION_NUM:
		return nvme_sg_get_version_num((void __user *)arg);
	case SG_IO:
		return nvme_sg_io(ns, (void __user *)arg);
	default:
		return -ENOTTY;
	}
}


In the nvme-cli:

#define COMMAND_LIST \
	ENTRY(LIST, "list", "List all NVMe devices and namespaces on machine", list) \
	ENTRY(ID_CTRL, "id-ctrl", "Send NVMe Identify Controller", id_ctrl) \
	ENTRY(ID_NS, "id-ns", "Send NVMe Identify Namespace, display structure", id_ns) \
	ENTRY(LIST_NS, "list-ns", "Send NVMe Identify List, display structure", list_ns) \
	ENTRY(GET_NS_ID, "get-ns-id", "Retrieve the namespace ID of opened block device", get_ns_id) \
	ENTRY(GET_LOG, "get-log", "Generic NVMe get log, returns log in raw format", get_log) \
	ENTRY(GET_FW_LOG, "fw-log", "Retrieve FW Log, show it", get_fw_log) \
	ENTRY(GET_SMART_LOG, "smart-log", "Retrieve SMART Log, show it", get_smart_log) \
	ENTRY(GET_ERR_LOG, "error-log", "Retrieve Error Log, show it", get_error_log) \
	ENTRY(GET_FEATURE, "get-feature", "Get feature and show the resulting value", get_feature) \
	ENTRY(SET_FEATURE, "set-feature", "Set a feature and show the resulting value", set_feature) \
	ENTRY(FORMAT, "format", "Format namespace with new block format", format) \
	ENTRY(FW_ACTIVATE, "fw-activate", "Activate new firmware slot", fw_activate) \
	ENTRY(FW_DOWNLOAD, "fw-download", "Download new firmware", fw_download) \
	ENTRY(ADMIN_PASSTHRU, "admin-passthru", "Submit arbitrary admin command, return results", admin_passthru) \
	ENTRY(IO_PASSTHRU, "io-passthru", "Submit an arbitrary IO command, return results", io_passthru) \
	ENTRY(RESET, "reset", "Reset the NVME controller", reset) \
	....
	

static int reset(int argc, char **argv)
{

	.... getopt staff ...


	err =  ioctl(fd, NVME_IOCTL_RESET);

	if (err < 0)
		perror(devicename);
	printf("NVME Controller reset retval %d\n", err);
	if (!err) {
		printf("NVME Controller reset.\n");
	}
	else if (err > 0)
		fprintf(stderr, "NVMe Status: %s\n", nvme_status_to_string(err));

	return err;
}


Thanks
David


-----Original Message-----
From: Linux-nvme [mailto:linux-nvme-bounces at lists.infradead.org] On Behalf Of Busch, Keith
Sent: Tuesday, March 17, 2015 10:59 PM
To: Brandon Schulz
Cc: linux-nvme at lists.infradead.org
Subject: RE: NVMe controller reset

You could do remove + rescan with sysfs if you don't have open references on your drive.

Here's a different proposal from 18 months ago:

http://lists.infradead.org/pipermail/linux-nvme/2013-September/000391.html

I still think it's a good idea.

On Tue, 17 Mar 2015, Brandon Schulz wrote:
> I see.  Thanks for the response.
> 
> So this won't work for the back-ports to 2.6.32-ish kernels, as you 
> mentioned, so does anyone have a preference for how it is done there?
> 
> We're going to have to implement a mechanism, and if there are strong 
> opinions against implementing it as an ioctl, I'd like to hear them 
> before we go write the code and submit the patches.
> 
> My objective here is to find an acceptable mechanism that can 
> eventually be pulled into the older distro kernels.  If I'm going about this the wrong way, please let me know.
> 
> Thanks,
> Brandon
> 
> 
> 
> -----Original Message-----
> From: Keith Busch [mailto:keith.busch at intel.com]
> Sent: Tuesday, March 17, 2015 3:30 PM
> To: Brandon Schulz
> Cc: linux-nvme at lists.infradead.org
> Subject: Re: NVMe controller reset
> 
> On Tue, 17 Mar 2015, Brandon Schulz wrote:
> > What is the preferred way (if there is one) to force a controller reset from user-space?
> >
> > If, for example, a user-space tool was going to do a firmware 
> > activate that required a
> controller reset, how would one trigger the reset?  I do not see an 
> ioctl defined for this today, or am I missing something?
> 
> Like this:
> 
>    # echo 1 > /sys/class/nvme/nvme<#>/device/reset
> 
> This assumes you're using kernel >= 4.0. Before that'd, the entry was 
> under "class/misc" instead of "class/nvme". It was moved to a new 
> class because 'misc' wasn't giving us enough minors to have more than several dozen NVMe drives in a system, but I want hundreds.
> 
> It also assumes you're using a kernel >= 3.15. While the sysfs entry 
> exists pre-3.15, it wasn't safe to use because there was no notification callback for drivers to subscribe to.
> 
> Had I known what a pain it was to back-port the pci callback to older 
> kernels (it apparently breaks kABI), I would have used an nvme specific sysfs entry, but that's where we are today.

_______________________________________________
Linux-nvme mailing list
Linux-nvme at lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme



More information about the Linux-nvme mailing list