[EXT] Re: nvme may get timeout from dd when using different non-prefetch mmio outbound/ranges

Mon Oct 25 20:40:54 PDT 2021

Hi, Keith and Bjorn

> -----Original Message-----
> From: Keith Busch [mailto:kbusch at kernel.org]
> Sent: Tuesday, October 26, 2021 12:22 AM
> To: Bjorn Helgaas
> Cc: Li Chen; linux-pci at vger.kernel.org; Lorenzo Pieralisi; Rob Herring;
> kw at linux.com; Bjorn Helgaas; linux-kernel at vger.kernel.org; Tom Joseph; Jens
> Axboe; Christoph Hellwig; Sagi Grimberg; linux-nvme at lists.infradead.org
> Subject: [EXT] Re: nvme may get timeout from dd when using different non-
> prefetch mmio outbound/ranges
> 
> On Mon, Oct 25, 2021 at 10:47:39AM -0500, Bjorn Helgaas wrote:
> > [+cc Tom (Cadence maintainer), NVMe folks]
> >
> > On Fri, Oct 22, 2021 at 10:08:20AM +0000, Li Chen wrote:
> > > pciec: pcie-controller at 2040000000 {
> > >                                 compatible = "cdns,cdns-pcie-host";
> > > 		device_type = "pci";
> > > 		#address-cells = <3>;
> > > 		#size-cells = <2>;
> > > 		bus-range = <0 5>;
> > > 		linux,pci-domain = <0>;
> > > 		cdns,no-bar-match-nbits = <38>;
> > > 		vendor-id = <0x17cd>;
> > > 		device-id = <0x0100>;
> > > 		reg-names = "reg", "cfg";
> > > 		reg = <0x20 0x40000000 0x0 0x10000000>,
> > > 		      <0x20 0x00000000 0x0 0x00001000>;	/* RC only */
> > >
> > > 		/*
> > > 		 * type: 0x00000000 cfg space
> > > 		 * type: 0x01000000 IO
> > > 		 * type: 0x02000000 32bit mem space No prefetch
> > > 		 * type: 0x03000000 64bit mem space No prefetch
> > > 		 * type: 0x43000000 64bit mem space prefetch
> > > 		 * The First 16MB from BUS_DEV_FUNC=0:0:0 for cfg space
> > > 		 * <0x00000000 0x00 0x00000000 0x20 0x00000000 0x00
> 0x01000000>, CFG_SPACE
> > > 		*/
> > > 		ranges = <0x01000000 0x00 0x00000000 0x20 0x00100000 0x00
> 0x00100000>,
> > > 			 <0x02000000 0x00 0x08000000 0x20 0x08000000 0x00
> 0x08000000>;
> > >
> > > 		#interrupt-cells = <0x1>;
> > > 		interrupt-map-mask = <0x00 0x0 0x0 0x7>;
> > > 		interrupt-map = <0x0 0x0 0x0 0x1 &gic 0 229 0x4>,
> > > 				<0x0 0x0 0x0 0x2 &gic 0 230 0x4>,
> > > 				<0x0 0x0 0x0 0x3 &gic 0 231 0x4>,
> > > 				<0x0 0x0 0x0 0x4 &gic 0 232 0x4>;
> > > 		phys = <&pcie_phy>;
> > > 		phy-names="pcie-phy";
> > > 		status = "ok";
> > > 	};
> > >
> > >
> > > After some digging, I find if I change the controller's range
> > > property from
> > >
> > > <0x02000000 0x00 0x08000000 0x20 0x08000000 0x00 0x08000000> into
> > > <0x02000000 0x00 0x00400000 0x20 0x00400000 0x00 0x08000000>,
> > >
> > > then dd will success without timeout. IIUC, range here
> > > is only for non-prefetch 32bit mmio, but dd will use dma (maybe cpu
> > > will send cmd to nvme controller via mmio?).
> 
> Generally speaking, an nvme driver notifies the controller of new
> commands via a MMIO write to a specific nvme register. The nvme
> controller fetches those commands from host memory with a DMA.
> 
> One exception to that description is if the nvme controller supports CMB
> with SQEs, but they're not very common. If you had such a controller,
> the driver will use MMIO to write commands directly into controller
> memory instead of letting the controller DMA them from host memory. Do
> you know if you have such a controller?
> 
> The data transfers associated with your 'dd' command will always use DMA.
> 

My nvme is " 05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 980". From its datasheet, https://s3.ap-northeast-2.amazonaws.com/global.semi.static/Samsung_NVMe_SSD_980_Data_Sheet_Rev.1.1.pdf, it says nothing about CMB/SQEs, so I'm not sure. Is there other ways/tools(like nvme-cli) to query?

> > I don't know how to interpret "ranges".  Can you supply the dmesg and
> > "lspci -vvs 0000:05:00.0" output both ways, e.g.,
> >
> >   pci_bus 0000:00: root bus resource [mem 0x7f800000-0xefffffff window]
> >   pci_bus 0000:00: root bus resource [mem 0xfd000000-0xfe7fffff window]
> >   pci 0000:05:00.0: [vvvv:dddd] type 00 class 0x...
> >   pci 0000:05:00.0: reg 0x10: [mem 0x.....000-0x.....fff ...]
> >
> > > Question:
> > > 1.  Why dd can cause nvme timeout? Is there more debug ways?
> 
> That means the nvme controller didn't provide a response to a posted
> command within the driver's latency tolerance.

FYI, with the help of pci bridger's vendor, they find something interesting: "From catc log, I saw some memory read pkts sent from SSD card, but its memory range is within the memory range of switch down port. So, switch down port will replay UR pkt. It seems not normal." and "Why SSD card send out some memory pkts which memory address is within switch down port's memory range. If so, switch will response UR pkts". I also don't understand how can this happen?

> 
> > > 2. How can this mmio range affect nvme timeout?
> 
> Let's see how those ranges affect what the kernel sees in the pci
> topology, as Bjorn suggested.

Ok, will add details in another mail replaying Bjorn.

> 
> ##############################################################
> ########
> This EXTERNAL email has been scanned by Proofpoint Email Protect service.

Regards,
Li

**********************************************************************
This email and attachments contain Ambarella Proprietary and/or Confidential Information and is intended solely for the use of the individual(s) to whom it is addressed. Any unauthorized review, use, disclosure, distribute, copy, or print is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy all copies of the original message. Thank you.