[bugzilla-daemon at kernel.org: [Bug 217251] New: pciehp: nvme not visible after re-insert to tbt port]

Keith Busch kbusch at kernel.org
Mon Mar 27 11:25:30 PDT 2023


On Mon, Mar 27, 2023 at 05:43:18PM +0000, Aleksander Trofimowicz wrote:
> 
> Keith Busch <kbusch at kernel.org> writes:
> 
> > On Mon, Mar 27, 2023 at 09:33:59AM -0500, Bjorn Helgaas wrote:
> >> Forwarding to NVMe folks, lists for visibility.
> >>
> >> ----- Forwarded message from bugzilla-daemon at kernel.org -----
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=217251
> >> ...
> >>
> >> Created attachment 304031
> >>   --> https://bugzilla.kernel.org/attachment.cgi?id=304031&action=edit
> >> the tracing of nvme_pci_enable() during re-insertion
> >>
> >> Hi,
> >>
> >> There is a JHL7540-based device that may host a NVMe device. After the first
> >> insertion a nvme drive is properly discovered and handled by the relevant
> >> modules. Once disconnected any further attempts are not successful. The device
> >> is visible on a PCI bus, but nvme_pci_enable() ends up calling
> >> pci_disable_device() every time; the runtime PM status of the device is
> >> "suspended", the power status of the 04:01.0 PCI bridge is D3. Preventing the
> >> device from being power managed ("on" -> /sys/devices/../power/control)
> >> combined with device removal and pci rescan changes nothing. A host reboot
> >> restores the initial state.
> >>
> >> I would appreciate any suggestions how to debug it further.
> >
> > Sounds the same as this report:
> >
> >   http://lists.infradead.org/pipermail/linux-nvme/2023-March/038259.html
> >
> > The driver is bailing on the device because we can't read it's status register
> > out of the remapped BAR. There's nothing we can do about that from the nvme
> > driver level. Memory mapped IO has to work in order to proceed.
> >
> Thanks. I can confirm it is the same problem:
> 
> a) the platform is Intel Alderlake
> b) readl(dev->bar + NVME_REG_CSTS) in nvme_pci_enable() fails
> c) reading BAR0 via setpci gives 0x00000004

It's strange too. In your example, kernel says:

  0000:05:00.0: BAR 0: assigned [mem 0x54000000-0x54003fff 64bit]

There is a check right after that message that ensures the kernel reads back
what it wrote. No failures reported means the device really did have the
expected BAR value at one point.



More information about the Linux-nvme mailing list