smartctl "kills" specific drives on kernel 5.13 but works fine on 5.15 - why?

Nick Neumann nick at pcpartpicker.com
Thu Sep 15 10:50:54 PDT 2022


On Wed, Sep 14, 2022 at 2:44 PM Nick Neumann <nick at pcpartpicker.com> wrote:
>
> I'm running ubuntu 20.04 LTS with HWE, which reports kernel 5.13.0-51
> generic . Both a crucial P5 1TB and Crucial P5 2TB behave rather
> poorly. With one drive installed, running
>
> sudo smartctl -x /dev/nvme0
>
> will output some info, then hang for a while, and then print
> "NVME_IOCTL_ADMIN_CMD: Interrupted system call"
>
> From that point on, the drives are gone from the system until I cut
> and restore power (reboot is not enough).
>
> Running smartctl against the drives works fine in windows and in
> Ubuntu 22.04 LTS, which reports kernel 5.15.0-43
>
> I thought for sure I'd find that a quirk for the drives had been added
> between kernels 5.13 and 5.15, but alas, I don't see one. The PCI
> Vendor/Device ID is 1344:5405 for the 1TB model, and while the crucial
> P2 has a quirk in drivers/nvme/host/pci.c, it has a different vendor
> ID altogether (c0a9).
>
> Any thoughts on where I can look or what I might compare to try to
> figure out what changed to get the Crucial P5 drives behaving? I was
> hoping there was some setting I could tweak to get them going without
> having to move to 22.04 LTS. (I've tried
> "nvme_core.default_ps_max_latency_us=0" and various values for
> "pci_aspm" with no luck.)

Figured this out. It isn't a linux kernel change, but rather a
smartctl change. (In hindsight I should have started digging there
first.)

The issue was https://www.smartmontools.org/ticket/1404, fixed by the
7.2 release (and Ubuntu 20.04LTS is on 7.1). The fix in smartmontools
was to change to reading logs 4KB at a time, just like nvme did in
https://github.com/linux-nvme/nvme-cli/commit/465a4d. (The device
advertises that it has an MDTS of 9 so, as far as I understand,
reading in 4KB chunks should not be necessary; the smartmontools
author was not certain where the blame for the issue really belonged,
but changing to work like nvme-cli avoids it.)

For now I'll avoid reading the error log via smartctl on problematic
drives until I can move to a later smartmontools version.

Thanks,
Nick



More information about the Linux-nvme mailing list