[PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false

Tao Jin me at kingtous.cn
Fri Apr 8 17:58:27 PDT 2022


Thanks for your kind reply.

The output from command "nvme ns-descs /dev/nvme0n1" shows below:

before suspend:
NVME Namespace Identification Descriptors NS 1:
uuid    : 01000000-0000-0000-0000-000000000000

but after suspend, uuid seems disapeared:
NVME Namespace Identification Descriptors NS 1:
eui64   : 0100000000000000

If I do more suspend operations, the output is the same:
NVME Namespace Identification Descriptors NS 1:
eui64   : 0100000000000000


Note that I'm using the kernel which customed by myself, which comments 
out "goto out_free_id". It means "nvme_update_ns_info" will be called 
even if invalidate ids failed. Because I can't do suspend operation if 
using official kernel, which will cause my SSD directly invisible in 
Linux and trigger ext4 error, freezing the laptop.

```
static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_ids *ids)
{
	struct nvme_id_ns *id;
	int ret = NVME_SC_INVALID_NS | NVME_SC_DNR;

	if (test_bit(NVME_NS_DEAD, &ns->flags))
		goto out;

	ret = nvme_identify_ns(ns->ctrl, ns->head->ns_id, ids, &id);
	if (ret)
		goto out;

	ret = NVME_SC_INVALID_NS | NVME_SC_DNR;
	if (!nvme_ns_ids_equal(&ns->head->ids, ids)) {
		dev_err(ns->ctrl->device,
			"identifiers changed for nsid %d\n", ns->head->ns_id);
-		goto out_free_id;
	}

	ret = nvme_update_ns_info(ns, id);

out_free_id:
	kfree(id);
out:
	/*
	 * Only remove the namespace if we got a fatal error back from the
	 * device, otherwise ignore the error and just move on.
	 *
	 * TODO: we should probably schedule a delayed retry here.
	 */
	if (ret > 0 && (ret & NVME_SC_DNR))
		nvme_ns_remove(ns);
}
```
In addition, Windows 10/11 has no suspend issue in this laptop. It's 
really weird.

在 2022/4/9 00:04, Christoph Hellwig 写道:
> On Fri, Apr 08, 2022 at 09:18:19AM -0600, Keith Busch wrote:
>> On Fri, Apr 08, 2022 at 10:07:21AM +0200, Christoph Hellwig wrote:
>>> On Fri, Apr 08, 2022 at 03:56:49PM +0800, 金韬 wrote:
>>>> This is output from dmesg. Seems that "eui" has changed.
>>>>
>>>> [    2.086226] loop0: detected capacity change from 0 to 8
>>>> [   26.577001] eui changed from 0100000000000000 to 0000000000000001
>>>> [   26.577003] nvme nvme0: identifiers changed for nsid 1
>>>
>>> Ok, looks like the device is broken and changes the EUID after power
>>> cycles.  Can you send the output of lspci -v?
>>>
>>> Also just out of curiousity, does the ID keep changing if you do more
>>> suspend cycles?
>>
>> The eui isn't legit in the first place (no OUI), and appears to be swqpping the
> 
> Yes.
> 
>> byte order during resume. This should be reported to the vendor.
> 
> Well, the id-ns output posted earlier shows the same output before and
> after resume.  Which is really weird.
> 
> Either way we'll have to quirk it some way.
> 
> Just to pointpoint this down a bit, what does
> 
>     nvme ns-descs /dev/nvme0n1
> 
> report?  I wonder if we get different IDs from the different methods
> to retrive them given that namespace allocation looks at the
> Namespace Identification Descriptor last, while revalidation only
> looks at Identify Namespace.
> 



More information about the Linux-nvme mailing list