[PATCH] nvme: Don't fail to resume if NSIDs change

Limonciello, Mario mario.limonciello at amd.com
Mon Jul 31 13:09:08 PDT 2023



On 7/31/2023 2:54 PM, August Wikerfors wrote:
> On 2023-07-31 21:10, Keith Busch wrote:
>> On Mon, Jul 31, 2023 at 01:51:03PM -0500, Mario Limonciello wrote:
>>> Samsung PM9B1 has problems after resume because NSID has changed.
>>> This has been reported in the past on OEM varities of PM9B1 parts
>>> and fixed by firmware updates on 'some' of those parts.
>>>
>>> However this same issue also happens on 'retail' PM9B1 parts which
>>> Samsung has not released firmware updates for.
>>>
>>> As the check has been relaxed at startup for multiple disks with
>>> duplicate NSIDs with commit ac522fc6c3165 ("nvme: don't reject
>>> probe due to duplicate IDs for single-ported PCIe devices") also
>>> relax the check that runs on resume for NSIDs and mark them bogus
>>> if this occurs on resume.
>>
>> How could the driver tell the difference between the device needing a
>> quirk compared to a rapid delete-create-attach namespace sequence?
>> Proceeding with the namespace now may get dirty writes intended for the
>> previous namespace, corrupting the new one.
>>
>> The commit you mentioned tries to constrain allowing duplication where
>> we can reasonably assume the quirk is needed. If we need to do similiar
>> for this condition, one possible constraint might be that the device
>> doesn't report OACS bit 3 (Namespace Management).
> 
> It looks like that would work for the PM9B1:
>> $ sudo nvme id-ctrl -H /dev/nvme0
>> [...] > oacs      : 0x17
>>   [10:10] : 0   Lockdown Command and Feature Not Supported
>>   [9:9] : 0     Get LBA Status Capability Not Supported
>>   [8:8] : 0     Doorbell Buffer Config Not Supported
>>   [7:7] : 0     Virtualization Management Not Supported
>>   [6:6] : 0     NVMe-MI Send and Receive Not Supported
>>   [5:5] : 0     Directives Not Supported
>>   [4:4] : 0x1   Device Self-test Supported
>>   [3:3] : 0     NS Management and Attachment Not Supported
>>   [2:2] : 0x1   FW Commit and Download Supported
>>   [1:1] : 0x1   Format NVM Supported
>>   [0:0] : 0x1   Security Send and Receive Supported
> 
> Regards,
> August Wikerfors

So is it reasonable to just add a check for

ctrl->oacs & NVME_CTRL_OACS_NS_MNGT_SUPP

In the same error handling path as this patch?



More information about the Linux-nvme mailing list