[BUG] nvme-pci: NVMe probe fails with ENODEV

Rajat Khandelwal rajat.khandelwal at linux.intel.com
Thu Mar 9 10:13:33 PST 2023


Hi,

On 3/9/2023 10:54 PM, Keith Busch wrote:
> On Thu, Mar 09, 2023 at 10:36:04PM +0530, Rajat Khandelwal wrote:
>> On 3/9/2023 8:54 PM, Keith Busch wrote:
>>> On Thu, Mar 09, 2023 at 04:12:18PM +0100, Christoph Hellwig wrote:
>>>> On Thu, Mar 09, 2023 at 07:31:07PM +0530, Rajat Khandelwal wrote:
>>>>> Hi,
>>>>> I am seeking some help regarding an issue I encounter sporadically
>>>>> with Samsung Portable TBT SSD X5.
>>>>>
>>>>> Right from the thunderbolt discovery to the PCIe enumeration, everything
>>>>> is fine, until 'NVME_REG_CSTS' is tried to be read in 'nvme_reset_work'.
>>>>> Precisely, 'readl(dev->bar + NVME_REG_CSTS)' fails.
>>>>>
>>>>> I handle type-C, thunderbolt and USB4 on Chrome platforms, and currently
>>>>> we are working on Intel Raptorlake systems.
>>>>> This issue has been witnessed from ADL time-frame and now is seen
>>>>> on RPL as well. I would really like to get to the bottom of the problem
>>>>> and close the issue.
>>>>>
>>>>> I have tried 5.10 and 6.1.15 kernels.
>>>> So we have a quirk for a device called Samsung X5 in core.c, which is a
>>>> bit of an unusual match.  Can you check that it gets applied for the
>>>> device that you are testing?
>>>>
>>>> Also if it gets applied, can you test this patch?
>>> That won't help here. The driver should be bailing on the device
>>> nvme_pci_enable() before we do the ready check:
>>>
>>> static int nvme_pci_enable(struct nvme_dev *dev)
>>> {
>>> ...
>>>           if (readl(dev->bar + NVME_REG_CSTS) == -1) {
>>>                   result = -ENODEV;
>>>                   goto disable;
>>>           }
>>>
>>> It sounds like the bridge has a valid memory window, and the kernel assigned it
>>> to the device, but for some reason the device didn't apply it to its BAR. Maybe
>>> the device just doesn't support hotplug?
>> The issue is sporadic in nature, witnessed even during reboots with the device
>> attached.
>> Is such a scenario even possible (BAR not getting written by the hardware)?
> It's not supposed to be possible, but your analysis checking the BAR register
> with setpci seems pretty convincing that that is happening.

I see. Any suggestions on what can be done for further steps?

Thanks
Rajat




More information about the Linux-nvme mailing list