[PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Sun May 10 22:18:23 PDT 2026

On Thu, May 07, 2026 at 11:25:23AM +0100, Jon Hunter wrote:
> Hi Bjorn, Mani,
> 
> On 22/01/2026 15:29, Bjorn Helgaas wrote:
> > [+cc NVMe folks]
> > 
> > On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
> > > ...
> > 
> > > Since this commit was added in Linux v6.18, I have been observing a suspend
> > > test failures on some of our boards. The suspend test suspends the devices
> > > for 20 secs and before this change the board would resume in about ~27 secs
> > > (including the 20 sec sleep). After this change the board would take over 80
> > > secs to resume and this triggered a failure.
> > > 
> > > Looking at the logs, I can see it is the NVMe device on the board that is
> > > having an issue, and I see the reset failing ...
> > > 
> > >   [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
> > >    flow control rx/tx
> > >   [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
> > >    0 timeout, reset controller
> > >   [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
> > >   [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
> > >   [ 1003.050481] OOM killer enabled.
> > >   [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
> > > 
> > >  From the above timestamps the delay is coming from the NVMe. I see this
> > > issue on several boards with different NVMe devices and I can workaround
> > > this by disabling ASPM L0/L1 for these devices ...
> > > 
> > >   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
> > >   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
> > >   DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
> > >   DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
> > > 
> > > I am curious if you have seen any similar issues?
> > > 
> > > Other PCIe devices seem to be OK (like the realtek r8169) but just
> > > the NVMe is having issues. So I am trying to figure out the best way
> > > to resolve this?
> > 
> > For context, "this commit" refers to f3ac2ff14834, modified by
> > df5192d9bb0e:
> > 
> >    f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
> >    df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
> > 
> > The fact that this suspend issue only affects NVMe reminds me of the
> > code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
> > enabled because of some NVMe expectation:
> > 
> >    dw_pcie_suspend_noirq()
> >    {
> >      ...
> >      /*
> >       * If L1SS is supported, then do not put the link into L2 as some
> >       * devices such as NVMe expect low resume latency.
> >       */
> >      if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
> >        return 0;
> >      ...
> > 
> > That suggests there's some NVMe/ASPM interaction that the PCI core
> > doesn't understand yet.
> > 
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-designware-host.c?id=v6.18#n1146
> 
> 
> I want to revisit this issue. From my perspective low-power suspend has now
> been broken on some of our Tegra platforms (that have NVMe devices) since
> v6.19 and so far this is no resolution to this issue. The patch that was
> proposed to fix this [0] has been rejected by qualcomm and although this
> does workaround the issue, my confidence that this is the right fix is now
> low.
> 

The referenced patch is now merged into arm-soc for v7.2:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=7602c0ec0bbfd3985d49f4f0cad281c1414008c9

I hope this takes care of the issue you are dealing with.

- Mani

-- 
மணிவண்ணன் சதாசிவம்