[PATCH v1 1/3] driver core: Support asynchronous driver shutdown
Jonathan Derrick
jonathan.derrick at linux.dev
Thu Mar 31 09:57:40 PDT 2022
On 3/28/2022 6:19 PM, Oliver O'Halloran wrote:
> On Tue, Mar 29, 2022 at 10:35 AM Tanjore Suresh <tansuresh at google.com> wrote:
>>
>> This changes the bus driver interface with additional entry points
>> to enable devices to implement asynchronous shutdown. The existing
>> synchronous interface to shutdown is unmodified and retained for
>> backward compatibility.
>>
>> This changes the common device shutdown code to enable devices to
>> participate in asynchronous shutdown implementation.
>
> nice to see someone looking at improving the shutdown path
>
>> Signed-off-by: Tanjore Suresh <tansuresh at google.com>
>> ---
>> drivers/base/core.c | 39 +++++++++++++++++++++++++++++++++++++-
>> include/linux/device/bus.h | 10 ++++++++++
>> 2 files changed, 48 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index 3d6430eb0c6a..359e7067e8b8 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
>> *snip*
>
> This all seems a bit dangerous and I'm wondering what systems you've
> tested these changes with. I had a look at implementing something
> similar a few years ago and one case that always concerned me was
> embedded systems where the PCIe root complex also has a driver bound.
> Say you've got the following PCIe topology:
>
> 00:00.0 - root port
> 01:00.0 - nvme drive
>
> With the current implementation of device_shutdown() we can guarantee
> that the child device (the nvme) is shut down before we start trying
> to shut down the parent device (the root complex) so there's no
> possibility of deadlocks and other dependency headaches. With this
> implementation of async shutdown we lose that guarantee and I'm not
> sure what the consequences are. Personally I was never able to
> convince myself it was safe, but maybe you're braver than I am :)
>
> That all said, there's probably only a few kinds of device that will
> really want to implement async shutdown support so maybe you can
> restrict it to leaf devices and flip the ordering around to something
> like:
It seems like it might be helpful to split the async shutdowns into
refcounted hierarchies and proceed with the next level up when all the
refs are in.
Ex:
00:00.0 - RP
01:00.0 - NVMe A
02:00.0 - Bridge USP
03:00.0 - Bridge DSP
04:00.0 - NVMe B
03:00.1 - Bridge DSP
05:00.0 - NVMe C
NVMe A could start shutting down at the beginning of the hierarchy
traversal. Then async shutdown of bus 3 wouldn't start until all
children of bus 3 are shutdown.
You could probably do this by having the async_shutdown_list in the pci_bus.
>
> for_each_device(dev) {
> if (can_async(dev) && has_no_children(dev))
> start_async_shutdown(dev)
> }
> wait_for_all_async_shutdowns_to_finish()
>
> // tear down the remaining system devices synchronously
> for_each_device(dev)
> do_sync_shutdown(dev)
>
>> /*
>> diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
>> index a039ab809753..e261819601e9 100644
>> --- a/include/linux/device/bus.h
>> +++ b/include/linux/device/bus.h
>> @@ -93,6 +101,8 @@ struct bus_type {
>> void (*sync_state)(struct device *dev);
>> void (*remove)(struct device *dev);
>> void (*shutdown)(struct device *dev);
>> + void (*shutdown_pre)(struct device *dev);
>> + void (*shutdown_post)(struct device *dev);
>
> Call them shutdown_async_start() / shutdown_async_end() or something
> IMO. These names are not at all helpful and they're easy to mix up
> their role with the class based shutdown_pre / _post
>
More information about the Linux-nvme
mailing list