[PATCH v8 3/4] driver core: shut down devices asynchronously
Greg Kroah-Hartman
gregkh at linuxfoundation.org
Wed Sep 25 01:55:54 PDT 2024
On Tue, Sep 24, 2024 at 11:44:02PM +0300, Andrey Skvortsov wrote:
> Hi,
>
> On 24-09-24 11:23, Greg Kroah-Hartman wrote:
> > On Mon, Sep 23, 2024 at 11:50:39PM +0300, Andrey Skvortsov wrote:
> > > Hi Stuart,
> > >
> > > On 24-08-22 15:28, Stuart Hayes wrote:
> > > > Add code to allow asynchronous shutdown of devices, ensuring that each
> > > > device is shut down before its parents & suppliers.
> > > >
> > > > Only devices with drivers that have async_shutdown_enable enabled will be
> > > > shut down asynchronously.
> > > >
> > > > This can dramatically reduce system shutdown/reboot time on systems that
> > > > have multiple devices that take many seconds to shut down (like certain
> > > > NVMe drives). On one system tested, the shutdown time went from 11 minutes
> > > > without this patch to 55 seconds with the patch.
> > > >
> > > > Signed-off-by: Stuart Hayes <stuart.w.hayes at gmail.com>
> > > > Signed-off-by: David Jeffery <djeffery at redhat.com>
> > > > ---
> > > > drivers/base/base.h | 4 +++
> > > > drivers/base/core.c | 54 ++++++++++++++++++++++++++++++++++-
> > > > include/linux/device/driver.h | 2 ++
> > > > 3 files changed, 59 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/base/base.h b/drivers/base/base.h
> > > > index 0b53593372d7..aa5a2bd3f2b8 100644
> > > > --- a/drivers/base/base.h
> > > > +++ b/drivers/base/base.h
> > > > @@ -10,6 +10,7 @@
> > > > * shared outside of the drivers/base/ directory.
> > >
> > > This change landed in linux-next and I have problem with shutdown on
> > > ARM Allwinner A64 device. Device usually hangs at shutdown.
> > > git bisect pointed to "driver core: shut down devices asynchronously"
> > > as a first bad commit.
> > >
> > > I've tried to debug the problem and this is what I see:
> > >
> > > 1) device 'mmc_host mmc0' processed in device_shutdown. For this device
> > > async_schedule_domain is called (cookie 264, for example).
> > >
> > > 2) after that 'mmcblk mmc0:aaaa' is processed. For this device
> > > async_schedule_domain is called (cookie 296, for example).
> > >
> > > 3) 'mmc_host mmc0' is parent of 'mmcblk mmc0:aaaa' and
> > > parent->p->shutdown_after is updated from 263 to 296.
> > >
> > > 4) After sometime shutdown_one_device_async is called for 264
> > > (mmc_host mmc0), but dev->p->shutdown_after was updated to 296 and the
> > > code calls first async_synchronize_cookie_domain for 297.
> > >
> > > 264 can't finish, because it waits for 297. shutdown process can't continue.
> > >
> > > The problem is always with a MMC host controller.
> >
> > If you take the patch here:
> > https://lore.kernel.org/r/20240919043143.1194950-1-stuart.w.hayes@gmail.com
> > does it solve the problem?
>
> Unfortunately not. I've applied the patch to next-20240920 and tested
> latest next-20240924 with patch integrated already. In both cases
> shutdown hangs.
Ugh. Ok, this is making me very nervous so I'm going to go and revert
this series from my tree and let's take it again after -rc1 is out and
people can work through the issues there.
thanks,
greg k-h
More information about the Linux-nvme
mailing list