nvme may get timeout from dd when using different non-prefetch mmio outbound/ranges

Mon Oct 25 09:21:58 PDT 2021

On Mon, Oct 25, 2021 at 10:47:39AM -0500, Bjorn Helgaas wrote:
> [+cc Tom (Cadence maintainer), NVMe folks]
> 
> On Fri, Oct 22, 2021 at 10:08:20AM +0000, Li Chen wrote:
> > pciec: pcie-controller at 2040000000 {
> >                                 compatible = "cdns,cdns-pcie-host";
> > 		device_type = "pci";
> > 		#address-cells = <3>;
> > 		#size-cells = <2>;
> > 		bus-range = <0 5>;
> > 		linux,pci-domain = <0>;
> > 		cdns,no-bar-match-nbits = <38>;
> > 		vendor-id = <0x17cd>;
> > 		device-id = <0x0100>;
> > 		reg-names = "reg", "cfg";
> > 		reg = <0x20 0x40000000 0x0 0x10000000>,
> > 		      <0x20 0x00000000 0x0 0x00001000>;	/* RC only */
> > 
> > 		/*
> > 		 * type: 0x00000000 cfg space
> > 		 * type: 0x01000000 IO
> > 		 * type: 0x02000000 32bit mem space No prefetch
> > 		 * type: 0x03000000 64bit mem space No prefetch
> > 		 * type: 0x43000000 64bit mem space prefetch
> > 		 * The First 16MB from BUS_DEV_FUNC=0:0:0 for cfg space
> > 		 * <0x00000000 0x00 0x00000000 0x20 0x00000000 0x00 0x01000000>, CFG_SPACE
> > 		*/
> > 		ranges = <0x01000000 0x00 0x00000000 0x20 0x00100000 0x00 0x00100000>,
> > 			 <0x02000000 0x00 0x08000000 0x20 0x08000000 0x00 0x08000000>;
> > 
> > 		#interrupt-cells = <0x1>;
> > 		interrupt-map-mask = <0x00 0x0 0x0 0x7>;
> > 		interrupt-map = <0x0 0x0 0x0 0x1 &gic 0 229 0x4>,
> > 				<0x0 0x0 0x0 0x2 &gic 0 230 0x4>,
> > 				<0x0 0x0 0x0 0x3 &gic 0 231 0x4>,
> > 				<0x0 0x0 0x0 0x4 &gic 0 232 0x4>;
> > 		phys = <&pcie_phy>;
> > 		phy-names="pcie-phy";
> > 		status = "ok";
> > 	};
> > 
> > 
> > After some digging, I find if I change the controller's range
> > property from
> >
> > <0x02000000 0x00 0x08000000 0x20 0x08000000 0x00 0x08000000> into
> > <0x02000000 0x00 0x00400000 0x20 0x00400000 0x00 0x08000000>,
> >
> > then dd will success without timeout. IIUC, range here
> > is only for non-prefetch 32bit mmio, but dd will use dma (maybe cpu
> > will send cmd to nvme controller via mmio?).

Generally speaking, an nvme driver notifies the controller of new
commands via a MMIO write to a specific nvme register. The nvme
controller fetches those commands from host memory with a DMA.

One exception to that description is if the nvme controller supports CMB
with SQEs, but they're not very common. If you had such a controller,
the driver will use MMIO to write commands directly into controller
memory instead of letting the controller DMA them from host memory. Do
you know if you have such a controller?

The data transfers associated with your 'dd' command will always use DMA.

> I don't know how to interpret "ranges".  Can you supply the dmesg and
> "lspci -vvs 0000:05:00.0" output both ways, e.g., 
> 
>   pci_bus 0000:00: root bus resource [mem 0x7f800000-0xefffffff window]
>   pci_bus 0000:00: root bus resource [mem 0xfd000000-0xfe7fffff window]
>   pci 0000:05:00.0: [vvvv:dddd] type 00 class 0x...
>   pci 0000:05:00.0: reg 0x10: [mem 0x.....000-0x.....fff ...]
> 
> > Question:
> > 1.  Why dd can cause nvme timeout? Is there more debug ways?

That means the nvme controller didn't provide a response to a posted
command within the driver's latency tolerance.

> > 2. How can this mmio range affect nvme timeout?

Let's see how those ranges affect what the kernel sees in the pci
topology, as Bjorn suggested.