[PATCH v2 2/4] PCI: Indicate context lost if L1ss exit is broken during resume from system suspend
Manivannan Sadhasivam
mani at kernel.org
Sat May 23 02:14:31 PDT 2026
On Fri, May 22, 2026 at 06:21:10PM -0500, Bjorn Helgaas wrote:
> On Tue, May 19, 2026 at 01:41:21PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > From: Manivannan Sadhasivam <manivannan.sadhasivam at oss.qualcomm.com>
> >
> > The PCIe spec v7.0, sec 5.5.3.3.1, states that for exiting L1.2 due to an
> > endpoint asserting CLKREQ# signal, the refclk must be turned on no earlier
> > than TL10_REFCLK_ON, and within the latency advertised in the LTR message.
> > This same behavior applies to L1.1 as well.
>
> It sounds like only the "within the latency advertised in the LTR
> message" part is relevant in this case, and there's no issue with the
> "no earlier than TL10_REFCLK_ON" part?
>
Yes, that's true. I took the exerpt from the spec here, but there is no issue
in enabling REFCLK no earlier than TL10_REFCLK_ON.
> > On some platforms like Qcom, these requirements are satisfied during OS
> > runtime, but not while resuming from the system suspend. This happens
> > because the PCIe RC driver may remove all resource votes and turns off the
> > analog circuitry of PHY during suspend to maximize power savings while
> > keeping the link in L1ss.
> >
> > Consequently, when the endpoint asserts CLKREQ# to wake up, the OS must
> > first resume and the RC driver must restore the PHY and enable the REFCLK.
> > When this recovery process exceeds the L1ss exit latency time (roughly
> > L10_REFCLK_ON + T_COMMONMODE), the endpoint may treat it as a fatal
> > condition and triger Link Down (LDn). If the endpoint device is used to
> > host the RootFS, it will result in an OS crash. For other endpoints, it
> > may result in a complete device reset/recovery.
>
> s/triger/trigger/
>
> > So to indicate this platform limitation to the client drivers, introduce a
> > new flag 'pci_host_bridge::broken_l1ss_resume' and check it in the
> > pci_suspend_retains_context() API. If the flag is set by the RC driver, the
> > API will return 'false' indicating the client drivers that the device
> > context may not be retained and the drivers must be prepared for context
> > loss.
>
> Thanks for the details, this makes sense to me now.
>
Since we got an ack from NVMe maintainer, will you be queuing the series for
v7.2? I'd like this series to get soaked in linux-next for some time, though the
impact is very minimal.
- Mani
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam at oss.qualcomm.com>
> > ---
> > drivers/pci/pci.c | 11 +++++++++++
> > include/linux/pci.h | 2 ++
> > 2 files changed, 13 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 38cc5172d259..a7d2cb69b42e 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2910,6 +2910,8 @@ void pci_config_pm_runtime_put(struct pci_dev *pdev)
> > */
> > bool pci_suspend_retains_context(struct pci_dev *pdev)
> > {
> > + struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
> > +
> > /*
> > * If the platform firmware (like ACPI) is involved at the end of system
> > * suspend, device context may not be retained.
> > @@ -2917,6 +2919,15 @@ bool pci_suspend_retains_context(struct pci_dev *pdev)
> > if (pm_suspend_via_firmware())
> > return false;
> >
> > + /*
> > + * Some host bridges power off the PHY to enter deep low-power modes
> > + * during system suspend. Exiting L1 PM Substates from this condition
> > + * violates strict timing requirements and results in Link Down (LDn).
> > + * On such platforms, the endpoint must be prepared for context loss.
> > + */
> > + if (bridge && bridge->broken_l1ss_resume)
> > + return false;
> > +
> > /* Assume that the context is retained by default */
> > return true;
> > }
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index f60f9e4e7b39..1e5b59fa258a 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -660,6 +660,8 @@ struct pci_host_bridge {
> > unsigned int preserve_config:1; /* Preserve FW resource setup */
> > unsigned int size_windows:1; /* Enable root bus sizing */
> > unsigned int msi_domain:1; /* Bridge wants MSI domain */
> > + unsigned int broken_l1ss_resume:1; /* Resuming from L1ss during
> > + system suspend is broken */
> >
> > /* Resource alignment requirements */
> > resource_size_t (*align_resource)(struct pci_dev *dev,
> >
> > --
> > 2.48.1
> >
> >
--
மணிவண்ணன் சதாசிவம்
More information about the Linux-nvme
mailing list