[BUG] nvme-pci: NVMe probe fails with ENODEV

Keith Busch kbusch at kernel.org
Thu Mar 9 09:24:52 PST 2023


On Thu, Mar 09, 2023 at 10:36:04PM +0530, Rajat Khandelwal wrote:
> On 3/9/2023 8:54 PM, Keith Busch wrote:
> > On Thu, Mar 09, 2023 at 04:12:18PM +0100, Christoph Hellwig wrote:
> > > On Thu, Mar 09, 2023 at 07:31:07PM +0530, Rajat Khandelwal wrote:
> > > > Hi,
> > > > I am seeking some help regarding an issue I encounter sporadically
> > > > with Samsung Portable TBT SSD X5.
> > > > 
> > > > Right from the thunderbolt discovery to the PCIe enumeration, everything
> > > > is fine, until 'NVME_REG_CSTS' is tried to be read in 'nvme_reset_work'.
> > > > Precisely, 'readl(dev->bar + NVME_REG_CSTS)' fails.
> > > > 
> > > > I handle type-C, thunderbolt and USB4 on Chrome platforms, and currently
> > > > we are working on Intel Raptorlake systems.
> > > > This issue has been witnessed from ADL time-frame and now is seen
> > > > on RPL as well. I would really like to get to the bottom of the problem
> > > > and close the issue.
> > > > 
> > > > I have tried 5.10 and 6.1.15 kernels.
> > > So we have a quirk for a device called Samsung X5 in core.c, which is a
> > > bit of an unusual match.  Can you check that it gets applied for the
> > > device that you are testing?
> > > 
> > > Also if it gets applied, can you test this patch?
> > That won't help here. The driver should be bailing on the device
> > nvme_pci_enable() before we do the ready check:
> > 
> > static int nvme_pci_enable(struct nvme_dev *dev)
> > {
> > ...
> >          if (readl(dev->bar + NVME_REG_CSTS) == -1) {
> >                  result = -ENODEV;
> >                  goto disable;
> >          }
> > 
> > It sounds like the bridge has a valid memory window, and the kernel assigned it
> > to the device, but for some reason the device didn't apply it to its BAR. Maybe
> > the device just doesn't support hotplug?
> 
> The issue is sporadic in nature, witnessed even during reboots with the device
> attached.
> Is such a scenario even possible (BAR not getting written by the hardware)?

It's not supposed to be possible, but your analysis checking the BAR register
with setpci seems pretty convincing that that is happening.



More information about the Linux-nvme mailing list