[PATCH v5 1/2] PCI: add AMD PCIe quirk for nvme shutdown opt

Liang, Prike Prike.Liang at amd.com
Wed May 19 23:57:41 PDT 2021


[Public]

> From: Bjorn Helgaas <helgaas at kernel.org>
> Sent: Thursday, May 20, 2021 5:34 AM
> To: Liang, Prike <Prike.Liang at amd.com>
> Cc: linux-pci at vger.kernel.org; kbusch at kernel.org; axboe at fb.com;
> hch at lst.de; sagi at grimberg.me; linux-nvme at lists.infradead.org; Deucher,
> Alexander <Alexander.Deucher at amd.com>; stable at vger.kernel.org; S-k,
> Shyam-sundar <Shyam-sundar.S-k at amd.com>; Chaitanya Kulkarni
> <chaitanya.kulkarni at wdc.com>; Rafael J. Wysocki <rjw at rjwysocki.net>;
> linux-pm at vger.kernel.org
> Subject: Re: [PATCH v5 1/2] PCI: add AMD PCIe quirk for nvme shutdown opt
>
> [+cc Rafael (probably nothing of interest to you), linux-pm]
>
> On Tue, May 18, 2021 at 10:24:34AM +0800, Prike Liang wrote:
> > In the NVMe controller default suspend-resume seems only save/restore
> > the NVMe link state by APST opt and the NVMe remains in D0 during this
> time.
> > Then the NVMe device will be shutdown by SMU firmware in the s2idle
> > entry and then will lost the NVMe power context during s2idle
> > resume.Finally, the NVMe command queue request will be processed
> > abnormally and result in access timeout.This issue can be settled by
> > using PCIe power set with simple suspend-resume process path instead of
> APST get/set opt.
>
> I can't parse the paragraph above, sorry.  I'm sure this means something to
> NVMe developers, but since you're adding this to the PCI core, not the NVMe
> core, it needs to be intelligible to ordinary PCI folks.
>
[Prike]  I'm sorry to make confusion here. Those patches addressed a s2idle resume broken problem
that the NVMe driver's default suspend-resume policy of using NVMe APST during suspend-to-idle
prevents the PCI root port from going to D3.

> For example, since you only use this flag in the NVMe driver, you should
> explain why the PCI core needs to keep track of the flag for you.  Normally I
> would assume the driver could figure this out in its
> .probe() function.
>
[Prike] Yeah, we can assign the quirk flag in the .probe function or add it in nvme_id_table and this also
the primary solution we tried out. However, that seems not possible to enumerate every uncertain NVMe device then assign quirk flag to them. In this case, in order to handle various NVMe device we can use the root complex device ID to identify the question platform.

> Quirks are usually used to work around a defect in a device.  What's the
> defect in this case?  Ideally we can point to a section of the PCIe spec with a
> requirement that the device violates.
>
[Prike] In this case the quirk is only used to identify the question platform which requires the NVMe
device go to D3 in the s2idle suspend.
> What does "opt" mean?
>
[Prike] I'm also not dedicate working on the NVMe driver, but from the software perspective the APST
opt is used for handling the power state S&R without PCI interfering during s2idle legacy suspend-resume.

> What is SMU firmware?  Why is it relevant?
>
[Prike] SMU firmware is a proprietary micro component which responsible for device power management. Without the quirk flag, NVMe device will not enter D3 during s2idle suspend then SMU firmware will shut down the NVMe device, unfortunately since NVMe is a third-party device the SMU firmware only restore NVMe root port power state during s2ilde wake up process. Eventually, the NVMe device power state will be lost when back to OS s2idle resume  and then result in NVMe command request failed.

> Is this a problem only with s2idle?  Why or why not?
>
[Prike] Yeah, this issue is only found in the s2idle scenario, and that's because s2idle will check whether
each device will enter its own minimum power level defined in the LPI constrains table.

> The quirk applies to [1022:1630].  An lspci I found on the web says this is a
> "00:00.0 Host bridge: AMD Renoir Root Complex" device.  So it looks like this
> will result in PCI_BUS_FLAGS_DISABLE_ON_S2I being set for every PCI bus in
> the entire system.  But the description talks about an issue specifically with
> NVMe.
>
> Is there a defect in this AMD PCIe controller that affects all devices?
>
[Prike] In this solution by checking root complex DID to identify the question platform which need
the quirk flag. So far, only NVMe device need check this flag for special processing of NVMe
s2idle suspend.

> > In this patch prepare a PCIe RC bus flag to identify the platform
> > whether need the quirk.
> >
> > Cc: <stable at vger.kernel.org> # 5.10+
> > Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> > Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k at amd.com>
> > [ck: split patches for nvme and pcie]
> > Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>
> > Suggested-by: Keith Busch <kbusch at kernel.org>
> > Acked-by: Keith Busch <kbusch at kernel.org>
> > ---
> > Changes in v2:
> > Fix the patch format and check chip root complex DID instead of PCIe
> > RP to avoid the storage device plugged in internal PCIe RP by USB adaptor.
> >
> > Changes in v3:
> > According to Christoph Hellwig do NVME PCIe related identify opt
> > better in PCIe quirk driver rather than in NVME module.
> >
> > Changes in v4:
> > Split the fix to PCIe and NVMe part and then call the pci_dev_put()
> > put the device reference count and finally refine the commit info.
> >
> > Changes in v5:
> > According to Christoph Hellwig and Keith Busch better use a
> > passthrough device(bus) gloable flag to identify the NVMe shutdown opt
> rather than look up the device BDF.
> > ---
> >  drivers/pci/probe.c  | 5 ++++-
> >  drivers/pci/quirks.c | 7 +++++++
> >  include/linux/pci.h  | 2 ++
> >  3 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index
> > 953f15a..34ba691e 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -558,10 +558,13 @@ static struct pci_bus *pci_alloc_bus(struct
> pci_bus *parent)
> >     INIT_LIST_HEAD(&b->resources);
> >     b->max_bus_speed = PCI_SPEED_UNKNOWN;
> >     b->cur_bus_speed = PCI_SPEED_UNKNOWN;
> > +   if (parent) {
> >  #ifdef CONFIG_PCI_DOMAINS_GENERIC
> > -   if (parent)
> >             b->domain_nr = parent->domain_nr;
> >  #endif
> > +           if (parent->bus_flags & PCI_BUS_FLAGS_DISABLE_ON_S2I)
> > +                   b->bus_flags |= PCI_BUS_FLAGS_DISABLE_ON_S2I;
> > +   }
> >     return b;
> >  }
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > 653660e3..7c4bb8e 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -312,6 +312,13 @@ static void quirk_nopciamd(struct pci_dev *dev)
> > }
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD,
>       PCI_DEVICE_ID_AMD_8151_0,       quirk_nopciamd);
> >
> > +static void quirk_amd_s2i_fixup(struct pci_dev *dev) {
> > +   dev->bus->bus_flags |= PCI_BUS_FLAGS_DISABLE_ON_S2I;
> > +   pci_info(dev, "AMD simple suspend opt enabled\n"); }
> > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1630,
> > +quirk_amd_s2i_fixup);
> > +
> >  /* Triton requires workarounds to be used by the drivers */  static
> > void quirk_triton(struct pci_dev *dev)  { diff --git
> > a/include/linux/pci.h b/include/linux/pci.h index 53f4904..dc65219
> > 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -240,6 +240,8 @@ enum pci_bus_flags {
> >     PCI_BUS_FLAGS_NO_MMRBC  = (__force pci_bus_flags_t) 2,
> >     PCI_BUS_FLAGS_NO_AERSID = (__force pci_bus_flags_t) 4,
> >     PCI_BUS_FLAGS_NO_EXTCFG = (__force pci_bus_flags_t) 8,
> > +   /* Driver must pci_disable_device() for suspend-to-idle */
> > +   PCI_BUS_FLAGS_DISABLE_ON_S2I    = (__force pci_bus_flags_t) 16,
> >  };
> >
> >  /* Values from Link Status register, PCIe r3.1, sec 7.8.8 */
> > --
> > 2.7.4
> >



More information about the Linux-nvme mailing list