[PATCH V3 7/8] nvme: pci: recover controller reliably
jianchao.wang
jianchao.w.wang at oracle.com
Thu May 3 02:14:30 PDT 2018
Hi ming
On 05/03/2018 11:17 AM, Ming Lei wrote:
> static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
> @@ -1199,7 +1204,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> if (nvme_should_reset(dev, csts)) {
> nvme_warn_reset(dev, csts);
> nvme_dev_disable(dev, false, true);
> - nvme_reset_ctrl(&dev->ctrl);
> + nvme_eh_reset(dev);
> return BLK_EH_HANDLED;
> }
>
> @@ -1242,7 +1247,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> "I/O %d QID %d timeout, reset controller\n",
> req->tag, nvmeq->qid);
> nvme_dev_disable(dev, false, true);
> - nvme_reset_ctrl(&dev->ctrl);
> + nvme_eh_reset(dev);
w/o the 8th patch, invoke nvme_eh_reset in nvme_timeout is dangerous.
nvme_pre_reset_dev will send a lot of admin io when initialize the controller.
if this admin ios timeout, the nvme_timeout cannot handle this because the timeout work is sleeping
to wait admin ios.
In addition, even if we take the nvme_wait_freeze out of nvme_eh_reset and put it into another context,
but the ctrl state is still CONNECTING, the nvme_eh_reset cannot move forward.
Actually, I used to report this issue to Keith. I met io hung when the controller die in
nvme_reset_work -> nvme_wait_freeze. As you know, the nvme_reset_work cannot be scheduled because it is waiting.
Here is Keith's commit for this:
http://lists.infradead.org/pipermail/linux-nvme/2018-February/015603.html
Thanks
Jianchao
More information about the Linux-nvme
mailing list