[PATCH 1/1] nvme: extend and modify the APST configuration algorithm

Andy Lutomirski luto at kernel.org
Wed Apr 28 15:44:13 BST 2021


On Wed, Apr 28, 2021 at 5:43 AM hch at lst.de <hch at lst.de> wrote:
>
> Adding Andy who wrote the original APST code.
>
> On Wed, Apr 28, 2021 at 09:27:36AM +0000, Alexey Bogoslavsky wrote:
> > From: Alexey Bogoslavsky <Alexey.Bogoslavsky at wdc.com>
> >
> > The algorithm that was used until now for building the APST configuration
> > table has been found to produce entries with excessively long ITPT
> > (idle time prior to transition) for devices declaring relatively long
> > entry and exit latencies for non-operational power states. This leads
> > to unnecessary waste of power and, as a result, failure to pass
> > mandatory power consumption tests on Chromebook platforms.
> >
> > The new algorithm is based on two predefined ITPT values and two
> > predefined latency tolerances. Based on these values, as well as on
> > exit and entry latencies reported by the device, the algorithm looks
> > for up to 2 suitable non-operational power states to use as primary
> > and secondary APST transition targets. The predefined values are
> > supplied to the nvme driver as module parameters:
> >
> >  - apst_primary_timeout_ms (default: 100)
> >  - apst_secondary_timeout_ms (default: 2000)
> >  - apst_primary_latency_tol_us (default: 15000)
> >  - apst_secondary_latency_tol_us (default: 100000)
> >
> > The algorithm echoes the approach used by Intel's and Microsoft's drivers
> > on Windows. The specific default parameter values are also based on those
> > drivers. Yet, this patch doesn't introduce the ability to dynamically
> > regenerate the APST table in the event of switching the power source from
> > AC to battery and back. Adding this functionality may be considered in the
> > future. In the meantime, the timeouts and tolerances reflect a compromise
> > between values used by Microsoft for AC and battery scenarios.
> >
> > In most NVMe devices the new algorithm causes them to implement a more
> > aggressive power saving policy. While beneficial in most cases, this
> > sometimes comes at the price of a higher IO processing latency in certain
> > scenarios as well as at the price of a potential impact on the drive's
> > endurance (due to more frequent context saving when entering deep non-
> > operational states). So in order to provide a fallback for systems where
> > these regressions cannot be tolerated, the patch allows to revert to
> > the legacy behavior by setting either apst_primary_timeout_ms or
> > apst_primary_latency_tol_us parameter to 0. Eventually (and possibly after
> > fine tuning the default values of the module parameters) the legacy behavior
> > can be removed.

Can you give an example of the APST states and latencies on a device
for which this is useful?

I'm not opposed to adjusting the algorithm, but I'd like to understand
what we're up against.  If Linux were the only game in town, I would
say that the approach in this patch is unfortunate because of the
arbitrary thresholds it introduces, but if it tracks Windows, then
we're probably okay.

--Andy



More information about the Linux-nvme mailing list