`nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe
Paul Menzel
pmenzel at molgen.mpg.de
Thu May 2 22:52:32 PDT 2024
Dear Keith,
Thank you for your reply with a lot of background. This is much appreciated.
Am 02.05.24 um 10:43 schrieb Keith Busch:
> On Thu, May 02, 2024 at 08:12:39AM +0200, Paul Menzel wrote:
>>>> That doesn't seem too hard to believe to me. A safe shutdown can often
>>>> take a while time for an SSD. I've seen other implementations orders of
>>>> magnitude worse than what you're showing.
>>>
>>> But why? Due to physics or due to "slow" firmware?
>
> Maybe both? The run time metadata doesn't necessarily match the on-disk
> format, and constructing that can take a moment. These device's CPUs are
> usually the cheapest the vendor could get that satisfies a run-time
> performance target, so may be under powered for computational tasks.
>
> And it may also have to flush pending user data from its internal
> memory, which could be a few GB.
>
> Lower end devices don't even have memory, so may have to make many round
> trips to host memory to retrieve its metadata then manipulate that to
> its on-disk format.
Thank you for the details. Indeed “slow” firmware can be caused by
low-performant chips.
> Maybe this could be better optimized, but vendors may not consider
> shutdown time to be a high priority.
As this is all a black box, it’s hard to know. If this is more visible,
vendors might make it a higher priority.
> This gets worse as you add more nvme devices to your system because
> shutdown is serialized. Some of us have proposed patches parallelizing
> this process. I wish I could spend more time on helping see that to
> completion, but other priorities get in the way. :(
I didn’t know. I only have systems with one NVMe device. Also in other
parts like initializing CPU cores and applying microcode updates they
try to parallelize the initialization to decrease boot time.
>>> So this confirms the ftrace findings. Excuse my ignorance, so the
>>> time-out is in seconds? And how does this relate to the rtd3e value (410
>>> ms /= 60 ms /= 5 s(?)?
>
> The driver provides a user tunable parameter to specify the minimum
> timeout value, and it defaults to 5 seconds.
>
> nvme_core.shutdown_timeout=<time_in_seconds>
>
> The driver selects this or the advertised rtd3e, whichever is greater.
Reading the code, that is in `nvme_init_identify()`:
if (id->rtd3e) {
/* us -> s */
u32 transition_time = le32_to_cpu(id->rtd3e) / USEC_PER_SEC;
ctrl->shutdown_timeout = clamp_t(unsigned int, transition_time,
shutdown_timeout, 60);
if (ctrl->shutdown_timeout != shutdown_timeout)
dev_info(ctrl->device,
"D3 entry latency set to %u seconds\n",
ctrl->shutdown_timeout);
} else
ctrl->shutdown_timeout = shutdown_timeout;
> We can't trust device's to report this correctly (and NVMe 1.0 didn't
> even provide a way for a device to report an expected shutdown time), so
> this exists to prevent unsafe shutdowns. Devices are supposed to survive
> an unsafe shutdown, but it's best to avoid that path.
So it’s like this in my case, as the SK hynix reports 60 ms, but
actually takes 411 ms?
> The parameter is in granularity of seconds because the NVMe 1.0 spec
> said to "wait at least one second" for a shutdown to complete. Not the
> most clear wording for a spec, but that's where we started.
Thank you for the details. I am not a spec writer, but my gut feeling
says, there should always be a polling(?) solution and only upper
boundaries, that means “no longer than”, should be used.
Kind regards,
Paul
More information about the Linux-nvme
mailing list