[PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force

Tue Jun 20 06:15:07 PDT 2023

On Tue, Jun 20, 2023 at 07:41:41AM +0200, Christoph Hellwig wrote:
> On Fri, Jun 16, 2023 at 03:20:38PM +0800, Ming Lei wrote:
> > > > GD_DEAD is only set if the device is really dead, then all pending IO
> > > > will be failed.
> > > 
> > > del_gendisk also sets GD_DEAD early on.
> > 
> > No.
> > 
> > The hang happens in fsync_bdev() of del_gendisk(), and there are IOs pending on
> > bio_queue_enter().
> 
> IFF we know we can't do I/O by the time del_gendisk is called, we
> need to call mark_disk_dead first and not paper over the problem.

In theory, device removal can happen any time, when it isn't clear
if the controller is recovered well at that time, that is why this
API is added for avoiding to fail IO unnecessarily.

However maybe it is just fine to mark controller as dead in case that
removal breaks current error recovery given current nvme driver error
handling isn't very fine-grained control, so how about something like
the following:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c3d72fc677f7..120d98f348de 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -558,6 +558,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
        }

        if (changed) {
+               ctrl->old_state = ctrl->state;
                ctrl->state = new_state;
                wake_up_all(&ctrl->state_wq);
        }
@@ -4654,7 +4655,7 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
         * removing the namespaces' disks; fail all the queues now to avoid
         * potentially having to clean up the failed sync later.
         */
-       if (ctrl->state == NVME_CTRL_DEAD) {
+       if (ctrl->state == NVME_CTRL_DEAD || ctrl->old_state != NVME_CTRL_LIVE) {
                nvme_mark_namespaces_dead(ctrl);
                nvme_unquiesce_io_queues(ctrl);
        }
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 953e59f56139..7da53cc76f11 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -246,8 +246,9 @@ enum nvme_ctrl_flags {

 struct nvme_ctrl {
        bool comp_seen;
-       enum nvme_ctrl_state state;
        bool identified;
+       enum nvme_ctrl_state old_state;
+       enum nvme_ctrl_state state;
        spinlock_t lock;
        struct mutex scan_lock;
        const struct nvme_ctrl_ops *ops;

> 
> An API that force unfreezes is just broken and will leaves us with
> freezecount mismatches.

The freezecount mismatch problem is actually in nvme driver, please
see the previous patch[1] I posted.


[1] https://lore.kernel.org/linux-block/20230613005847.1762378-1-ming.lei@redhat.com/T/#m17ac1aa71056b6517f8aefbae58c301f296f0a73


Thanks,
Ming