[PATCH v10 0/5] shut down devices asynchronously
Laurence Oberman
loberman at redhat.com
Mon Jun 30 15:02:02 PDT 2025
On Mon, 2025-06-30 at 20:33 +0000, Michael Kelley wrote:
> From: Stuart Hayes <stuart.w.hayes at gmail.com> Sent: Wednesday, June
> 25, 2025 1:19 PM
> >
> > This adds the ability for the kernel to shutdown devices
> > asynchronously.
> >
> > Only devices with drivers that enable it are shut down
> > asynchronously.
> >
> > This can dramatically reduce system shutdown/reboot time on systems
> > that
> > have multiple devices that take many seconds to shut down (like
> > certain
> > NVMe drives). On one system tested, the shutdown time went from 11
> > minutes
> > without this patch to 55 seconds with the patch.
>
> I've tested this version and all looks good. I did the same tests
> that I did
> with v9 [1], running in a VM in the Azure cloud. The 2 NVMe devices
> are
> shutdown in parallel, gaining about 110 milliseconds, and there were
> no
> slowdowns as seen in v9. The net gain was ~100 ms.
>
> I also tested a local Hyper-V VM that does not have any NVMe devices.
> The shutdown timings with and without this patch set are pretty much
> the same, which was not the case with v9.
>
> I did not repeat the more detailed debugging from v9 as reported
> here [2], since there is no unexpected slowness with v10.
>
> For the series,
>
> Tested-by: Michael Kelley <mhklinux at outlook.com>
>
> [1]
> https://lore.kernel.org/lkml/BN7PR02MB41480DE777B9C224F3C2DF43D4792@BN7PR02MB4148.namprd02.prod.outlook.com/
> [2]
> https://lore.kernel.org/lkml/SN6PR02MB41571E2DD410D09CE7494B38D4402@SN6PR02MB4157.namprd02.prod.outlook.com/
>
> >
> > Changes from V9:
> >
> > Address resource and timing issues when spawning a unique async
> > thread
> > for every device during shutdown:
> > * Make the asynchronous threads able to shut down multiple
> > devices,
> > instead of spawning a unique thread for every device.
> > * Modify core kernel async code with a custom wake function so it
> > doesn't wake up threads waiting to synchronize every time the
> > cookie
> > changes
> >
> > Changes from V8:
> >
> > Deal with shutdown hangs resulting when a parent/supplier device is
> > later in the devices_kset list than its children/consumers:
> > * Ignore sync_state_only devlinks for shutdown dependencies
> > * Ignore shutdown_after for devices that don't want async
> > shutdown
> > * Add a sanity check to revert to sync shutdown for any device
> > that
> > would otherwise wait for a child/consumer shutdown that hasn't
> > already been scheduled
> >
> > Changes from V7:
> >
> > Do not expose driver async_shutdown_enable in sysfs.
> > Wrapped a long line.
> >
> > Changes from V6:
> >
> > Removed a sysfs attribute that allowed the async device shutdown to
> > be
> > "on" (with driver opt-out), "safe" (driver opt-in), or "off"...
> > what was
> > previously "safe" is now the only behavior, so drivers now only
> > need to
> > have the option to enable or disable async shutdown.
> >
> > Changes from V5:
> >
> > Separated into multiple patches to make review easier.
> > Reworked some code to make it more readable
> > Made devices wait for consumers to shut down, not just children
> > (suggested by David Jeffery)
> >
> > Changes from V4:
> >
> > Change code to use cookies for synchronization rather than async
> > domains
> > Allow async shutdown to be disabled via sysfs, and allow driver
> > opt-in or
> > opt-out of async shutdown (when not disabled), with ability to
> > control
> > driver opt-in/opt-out via sysfs
> >
> > Changes from V3:
> >
> > Bug fix (used "parent" not "dev->parent" in device_shutdown)
> >
> > Changes from V2:
> >
> > Removed recursive functions to schedule children to be shutdown
> > before
> > parents, since existing device_shutdown loop will already do this
> >
> > Changes from V1:
> >
> > Rewritten using kernel async code (suggested by Lukas Wunner)
> >
> > David Jeffery (1):
> > kernel/async: streamline cookie synchronization
> >
> > Stuart Hayes (4):
> > driver core: don't always lock parent in shutdown
> > driver core: separate function to shutdown one device
> > driver core: shut down devices asynchronously
> > nvme-pci: Make driver prefer asynchronous shutdown
> >
> > drivers/base/base.h | 8 ++
> > drivers/base/core.c | 210 +++++++++++++++++++++++++++++-
> > ----
> > drivers/nvme/host/pci.c | 1 +
> > include/linux/device/driver.h | 2 +
> > kernel/async.c | 42 ++++++-
> > 5 files changed, 236 insertions(+), 27 deletions(-)
> >
> > --
> > 2.39.3
> >
>
>
For the series:
Against
Kernel 6.16.0-rc4-dirty on an x86_64
Difference of about 15 seconds to shutdown compared to almost 60
Same set of test I always run and stable and repeatable
Looks good again, although V9 also looked good until Mike Kelley found
his issues.
Tested-by: Laurence Oberman <loberman at redhat.com>
More information about the Linux-nvme
mailing list