[PATCH v2 2/4] PCI: Indicate context lost if L1ss exit is broken during resume from system suspend

Manivannan Sadhasivam mani at kernel.org
Sat May 23 02:14:31 PDT 2026


On Fri, May 22, 2026 at 06:21:10PM -0500, Bjorn Helgaas wrote:
> On Tue, May 19, 2026 at 01:41:21PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > From: Manivannan Sadhasivam <manivannan.sadhasivam at oss.qualcomm.com>
> > 
> > The PCIe spec v7.0, sec 5.5.3.3.1, states that for exiting L1.2 due to an
> > endpoint asserting CLKREQ# signal, the refclk must be turned on no earlier
> > than TL10_REFCLK_ON, and within the latency advertised in the LTR message.
> > This same behavior applies to L1.1 as well.
> 
> It sounds like only the "within the latency advertised in the LTR
> message" part is relevant in this case, and there's no issue with the
> "no earlier than TL10_REFCLK_ON" part?
> 

Yes, that's true. I took the exerpt from the spec here, but there is no issue
in enabling REFCLK no earlier than TL10_REFCLK_ON.

> > On some platforms like Qcom, these requirements are satisfied during OS
> > runtime, but not while resuming from the system suspend. This happens
> > because the PCIe RC driver may remove all resource votes and turns off the
> > analog circuitry of PHY during suspend to maximize power savings while
> > keeping the link in L1ss.
> > 
> > Consequently, when the endpoint asserts CLKREQ# to wake up, the OS must
> > first resume and the RC driver must restore the PHY and enable the REFCLK.
> > When this recovery process exceeds the L1ss exit latency time (roughly
> > L10_REFCLK_ON + T_COMMONMODE), the endpoint may treat it as a fatal
> > condition and triger Link Down (LDn). If the endpoint device is used to
> > host the RootFS, it will result in an OS crash. For other endpoints, it
> > may result in a complete device reset/recovery.
> 
> s/triger/trigger/
> 
> > So to indicate this platform limitation to the client drivers, introduce a
> > new flag 'pci_host_bridge::broken_l1ss_resume' and check it in the
> > pci_suspend_retains_context() API. If the flag is set by the RC driver, the
> > API will return 'false' indicating the client drivers that the device
> > context may not be retained and the drivers must be prepared for context
> > loss.
> 
> Thanks for the details, this makes sense to me now.
> 

Since we got an ack from NVMe maintainer, will you be queuing the series for
v7.2? I'd like this series to get soaked in linux-next for some time, though the
impact is very minimal.

- Mani

> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam at oss.qualcomm.com>
> > ---
> >  drivers/pci/pci.c   | 11 +++++++++++
> >  include/linux/pci.h |  2 ++
> >  2 files changed, 13 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 38cc5172d259..a7d2cb69b42e 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2910,6 +2910,8 @@ void pci_config_pm_runtime_put(struct pci_dev *pdev)
> >   */
> >  bool pci_suspend_retains_context(struct pci_dev *pdev)
> >  {
> > +	struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
> > +
> >  	/*
> >  	 * If the platform firmware (like ACPI) is involved at the end of system
> >  	 * suspend, device context may not be retained.
> > @@ -2917,6 +2919,15 @@ bool pci_suspend_retains_context(struct pci_dev *pdev)
> >  	if (pm_suspend_via_firmware())
> >  		return false;
> >  
> > +	/*
> > +	 * Some host bridges power off the PHY to enter deep low-power modes
> > +	 * during system suspend. Exiting L1 PM Substates from this condition
> > +	 * violates strict timing requirements and results in Link Down (LDn).
> > +	 * On such platforms, the endpoint must be prepared for context loss.
> > +	 */
> > +	if (bridge && bridge->broken_l1ss_resume)
> > +		return false;
> > +
> >  	/* Assume that the context is retained by default */
> >  	return true;
> >  }
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index f60f9e4e7b39..1e5b59fa258a 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -660,6 +660,8 @@ struct pci_host_bridge {
> >  	unsigned int	preserve_config:1;	/* Preserve FW resource setup */
> >  	unsigned int	size_windows:1;		/* Enable root bus sizing */
> >  	unsigned int	msi_domain:1;		/* Bridge wants MSI domain */
> > +	unsigned int	broken_l1ss_resume:1;	/* Resuming from L1ss during
> > +						   system suspend is broken */
> >  
> >  	/* Resource alignment requirements */
> >  	resource_size_t (*align_resource)(struct pci_dev *dev,
> > 
> > -- 
> > 2.48.1
> > 
> > 

-- 
மணிவண்ணன் சதாசிவம்



More information about the Linux-nvme mailing list