[PATCH 1/4] driver core: Support two-pass driver shutdown

Bjorn Helgaas helgaas at kernel.org
Fri Jan 5 10:15:17 PST 2024


On Fri, Jan 05, 2024 at 05:29:55AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 02, 2024 at 10:07:30AM -0800, Jeremy Allison wrote:
> >> But I'm not a workqueue expert and I do see the scary warning at
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/workqueue.h?id=v6.6#n619,
> >> about not flushing system-wide workqueues, so maybe this wouldn't be
> >> workable.
> >
> > This is a bigger change than I'm comfortable with
> > (or indeed understand :-) right now. Can we fix
> > the immediate problem first please and then look
> > for improvements later ?
> 
> It's also a lot less efficient.  Assuming NVMe isn't alone and other
> hardware interfaces also have a shutdown/disable busy wait (and I've seen
> quite a few that do) as their limiting factor the two-pass shutdown seems
> much nicer than spawning tons of work items.

Here's the essence of the two-pass proposal, adding .shutdown_wait():

    device_shutdown
      for_each_device
        dev->bus->shutdown()
          ...
            nvme_shutdown
              nvme_disable_prepare_reset
                nvme_dev_disable
                  nvme_disable_ctrl
  +                 case NVME_DISABLE_SHUTDOWN_ASYNC:
  +                   return;
  +                 case NVME_DISABLE_SHUTDOWN_SYNC:
                      nvme_wait_ready(...);
  +     if (dev->bus->shutdown_wait)
  +       list_add(&shutdown_wait_list)
  +
  +   for_each(&shutdown_wait_list)
  +     dev->bus->shutdown_wait()
  +       pci_device_shutdown_wait
  +         nvme_shutdown_wait
  +           nvme_wait_ready(...)

FWIW, what I imagined was something like this, where we don't add
.shutdown_wait(), but any .shutdown() method could become asynchronous
by queuing a work item to do the wait:

  + nvme_wait_ready_work
  +   nvme_wait_ready(...)

    device_shutdown
  +   async_shutdown_wq = create_workqueue("async_shutdown");
      for_each_device
        dev->bus->shutdown()
          ...
            nvme_shutdown
              nvme_disable_prepare_reset
                nvme_dev_disable
                  nvme_disable_ctrl
  +                 case NVME_DISABLE_SHUTDOWN_ASYNC:
  +                   queue_work(async_shutdown_wq, &nvme_wait_ready_work)
  +                   return;
  +                 case NVME_DISABLE_SHUTDOWN_SYNC:
                      nvme_wait_ready(...);
  +
  +   destroy_workqueue(async_shutdown_wq);

This is a bit of hand waving because I'm not a workqueue expert, but
it seems like basically the same amount of overhead without having to
add .shutdown_wait() to the bus_type and the pci_driver.

Bjorn



More information about the Linux-nvme mailing list