`nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe

Keith Busch kbusch at kernel.org
Thu May 2 01:43:36 PDT 2024


On Thu, May 02, 2024 at 08:12:39AM +0200, Paul Menzel wrote:
> > > That doesn't seem too hard to believe to me. A safe shutdown can often
> > > take a while time for an SSD. I've seen other implementations orders of
> > > magnitude worse than what you're showing.
> > 
> > But why? Due to physics or due to "slow" firmware?

Maybe both? The run time metadata doesn't necessarily match the on-disk
format, and constructing that can take a moment. These device's CPUs are
usually the cheapest the vendor could get that satisifies a run-time
performance target, so may be under powered for computational tasks.

And it may also have to flush pending user data from its internal
memory, which could be a few GB.

Lower end devices don't even have memory, so may have to make many round
trips to host memory to retreive its metadata then manipulate that to
its on-disk format.

Maybe this could be better optimized, but vendors may not consider
shutdown time to be a high priority.

This gets worse as you add more nvme devices to your system because
shutdown is serialized. Some of us have proposed patches parallelizing
this process. I wish I could spend more time on helping see that to
completion, but other priorities get in the way. :(

> > So this confirms the ftrace findings. Excuse my ignorance, so the
> > time-out is in seconds? And how does this relate to the rtd3e value (410
> > ms /= 60 ms /= 5 s(?)?

The driver provides a user tunable parameter to specify the minimum
timeout value, and it defaults to 5 seconds.

  nvme_core.shutdown_timeout=<time_in_seconds>

The driver selects this or the advertised rtd3e, whichever is greater.
We can't trust device's to report this correctly (and NVMe 1.0 didn't
even provide a way for a device to report an expected shutdown time), so
this exists to prevent unsafe shutdowns. Devices are supposed to survive
an unsafe shutdown, but it's best to avoid that path.

The parameter is in granularity of seconds because the NVMe 1.0 spec
said to "wait at least one second" for a shutdown to complete. Not the
most clear wording for a spec, but that's where we started.



More information about the Linux-nvme mailing list