NVMe APST high latency power states being skipped

Kai-Heng Feng kai.heng.feng at canonical.com
Thu May 25 01:21:07 PDT 2017


On Wed, May 24, 2017 at 1:31 PM, Andy Lutomirski <luto at kernel.org> wrote:
> On Tue, May 23, 2017 at 9:53 PM, Kai-Heng Feng
> <kai.heng.feng at canonical.com> wrote:
>> On Wed, May 24, 2017 at 6:09 AM,  <Mario.Limonciello at dell.com> wrote:
>> [snipped]
>>>
>>> I know Kai Heng has looked at a /lot/ of disks. I've got stats from a few
>>> of them, but there are many more that I haven't seen.
>>
>> Not really, others I've seen have rather low latency. We have the same
>> high latency ones.
>>
>>> Perhaps Chris or Kai Heng might be able to provide a better parameter
>>> to base off from other experience.
>>
>> A quick summary: we need at least 61000 to make all of them be able to
>> enters PS3,
>> 1100000 for PS4.
>>
>> I'll do some performance testing on the 1100000 latency one.
>>
>> Is there anyway to observe the power state transition in NVMe?
>
> I don't think so, sadly.  It's probably possible to use non-autonomous
> transitions to force low power and then do some IO.  I can try to
> fiddle with this and see how hard it would be to whip up a simple
> benchmark.

I did some benchmark on the high latency SanDisk A400:
Kernel compilation, no PS3/PS4:
real    23m36.466s
user    115m49.944s
sys     10m58.352s

Kernel compilation, allow PS3/PS4:
real    24m40.308s
user    116m12.600s
sys     11m47.484s

Also, played a 4K video downloaded from youtube, no visual stutters.

>
>>
>> [snipped]
>>
>>> I think separate from the effort of getting the default right this makes sense.
>>> To me the most important default should be getting the disk into at least
>>> the first non-operational state even if latency is bad.
>>>
>>> Then provide the ability to block that non-operational state or go into
>>> other non-operational states that would be otherwise blocked due to latency
>>> by user code.
>>
>> We can add this to TLP by greping the PS3 latencies out of `nvme
>> id-ctrl` and do some math, but it will be ugly.
>>
>>>
>>>>
>>>> >
>>>> > Kai Heng can comment more on the testing they've done and the performance
>>>> > impact, but I understand that by tweaking those knobs they've been able to
>>>> > get all these disks into at least PS3 and saved a lot of power.
>>>> >
>>>> > We could go work with the TLP project  or power top guys and have them
>>>> > go and tweak the various sysfs knobs to make more of these disks work,
>>>> > but I would rather the kernel had good defaults across this collection of disks.
>>>>
>>>> Agreed.
>>
>> Other than TLP/powertop, we should make this easy to work with
>> something like thermald.
>> NVMe is quite hot. It can be quite useful to let thermald controls the
>> max available power state directly via sysfs knob. Fanless devices
>> will benefit a lot from this.
>
> Hmm.  That's doable but isn't strictly part of APST.  We could add a
> sysfs knob "operating_power_state" and a sysfs file that lists the
> available operating states.  APST is about transitions to
> *non-operating* states.
>
> Unfortunately, the info in the provided tables are almost entirely
> worthless when it comes to describing the performance impact of using
> reduced-power operating states.  Also, I wouldn't personally be
> shocked to see some interesting hardware bugs.

You are right, but we should allow the NVMe transit to non-opiating
states when the total system is too hot, even if it has pretty bad
latency.



More information about the Linux-nvme mailing list