[PATCH] PCIe bridge deferred probe breaks suspend resume

Rafael J. Wysocki rafael at kernel.org
Mon Feb 26 01:38:04 PST 2018

On Sun, Feb 25, 2018 at 5:28 PM, Feng Kan <fkan at apm.com> wrote:
> On Sun, Feb 25, 2018 at 1:29 AM, Rafael J. Wysocki <rafael at kernel.org> wrote:
>> On Sat, Feb 24, 2018 at 2:59 PM, Lukas Wunner <lukas at wunner.de> wrote:
>>> On Sat, Feb 24, 2018 at 10:47:26AM +0100, Rafael J. Wysocki wrote:
>>>> On Sat, Feb 24, 2018 at 10:35 AM, Rafael J. Wysocki <rafael at kernel.org> wrote:
>>>> > On Sat, Feb 24, 2018 at 1:46 AM, Feng Kan <fkan at apm.com> wrote:
>>>> >> This is not a patch, but rather a question regarding the deferred
>>>> >> probe's effect on PCIe PM ordering. This happens on our system
>>>> >> which defer the probing of root bridge due to  the IOMMU not being
>>>> >> ready. Because of the deferred action, the bridge is moved to the
>>>> >> end of the dpm_list which results in incorrect suspend and resume
>>>> >> sequence.
>>>> >>
>>>> >> In the cases I have seen, the bridge is always reordered because of
>>>> >> startup sequence. They are always place after the endpoint. If that
>>>> >> is the case the following code should be able to prevent such cases.
>>>> >> However, is there some cases here that would violate such situation?
>>>> >
>>>> > The code in dd.c assumes that the device being probed has no children
>>>> > (or consumers, for that matter) and so it is safe to move it to the
>>>> > end of the list.
>>>> >
>>>> > If the device has children (or consumers), moving it to the end of the
>>>> > list by itself doesn't work, which is the case for you.
>>>> >
>>>> > You can try to replace the device_pm_move_last(dev) in
>>>> > deferred_probe_work_func(struct() with device_reorder_to_tail(), but
>>>> > that has to be called under device_links_read_lock/unloc () and
>>>> > device_pm_lock/unloc() (in the right order).
>>>> Alternatively, you can replace your !dev_is_pci(dev) check with a
>>>> check for the presence of children or consumers and only move the
>>>> device to the end of the PM list if there are not any.  That would
>>>> kind of make sense, but it may break other assumptions in the deferred
>>>> probing mechanism which I don't recall ATM.
>>>> Or avoid deferred probing of the host bridge driver entirely.  I guess
>>>> you can use it with limited functionality until the IOMMU driver is
>>>> ready and switch over to the fully functional operation mode when that
>>>> happens, but that would need to hook into the IOMMU code somehow.
>>> Or model the root bridge's dependency on the iommu using a device link:
>>> https://www.kernel.org/doc/html/latest/driver-api/device_link.html
>> Apparently, there are children registered under the bridge device
>> before the driver for it is probed.
> Yes, so the order seems to be like this:
> 1. root port and endporint is enumerated and added to dpm_list
> 2. root port probed and deferred
> 3. iommu probed and successful
> 4. root port probed again and successful -> moved to last in dpm_list
> 5. endpoint probed and successful
> I guess another approach would be to prevent 1 from happening by not
> adding to dpm_list until the probe is successful.

No, system-wide suspend/resume of PCIe devices is handled by the PCI
bus type layer in part, so they have to be in dpm_list from the
outset.  Root ports, as their parents, need to be there before the

More information about the linux-arm-kernel mailing list