[PATCH 5/5] NVMe: IO queue deletion re-write
Sagi Grimberg
sagig at dev.mellanox.co.il
Sun Jan 3 09:05:18 PST 2016
On 03/01/2016 13:40, Christoph Hellwig wrote:
> On Sat, Jan 02, 2016 at 09:30:09PM +0000, Keith Busch wrote:
>> The async deletion was written for a bug reporting "hang" on a device
>> removal. The "hang" was the controller taking on the order of 100's msec
>> to delete a queue (sometimes >1sec if lots of commands queued). This
>> controller had 2k queues, and took ~15 minutes to remove serially. Async
>> deletion brought it down to ~20 seconds, so looked like a good idea.
>>
>> It wasn't a controller I make, so I personally don't care about
>> parallelizing queue deletion. The driver's been this way for so long
>> though, I don't have a good way to know how beneficial this feature
>> is anymore.
>
> How about something like the lightly tested patch below. It uses
> synchronous command submission, but schedules a work item on the
> system unbound workqueue for each queue, allowing the scheduler
> to execture them in parallel.
>
> ---
> From: Christoph Hellwig <hch at lst.de>
> Date: Sun, 3 Jan 2016 12:09:36 +0100
> Subject: nvme: semi-synchronous queue deletion
>
> Replace the complex async queue deletetion scheme with a a work_item
> per queue that is scheduled to the system unbound workqueue. That
> way we can use the normal synchronous command submission helpers,
> but let the scheduler distribute the deletions over all available
> CPUs.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
> drivers/nvme/host/pci.c | 180 +++++++-----------------------------------------
> 1 file changed, 25 insertions(+), 155 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index b82bbea..68ba2d4 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -89,13 +89,6 @@ static void nvme_process_cq(struct nvme_queue *nvmeq);
> static void nvme_remove_dead_ctrl(struct nvme_dev *dev);
> static void nvme_dev_shutdown(struct nvme_dev *dev);
>
> -struct async_cmd_info {
> - struct kthread_work work;
> - struct kthread_worker *worker;
> - int status;
> - void *ctx;
> -};
> -
> /*
> * Represents an NVM Express device. Each nvme_dev is a PCI function.
> */
> @@ -128,6 +121,10 @@ struct nvme_dev {
> #define NVME_CTRL_RESETTING 0
>
> struct nvme_ctrl ctrl;
> +
> + /* for queue deletion at shutdown time */
> + atomic_t queues_remaining;
> + wait_queue_head_t queue_delete_wait;
General question,
Any reason why you didn't just use a counting semaphore for this?
I've seen other places where people are moving away from those but
I didn't understand why...
More information about the Linux-nvme
mailing list