[PATCH V4 0/7] nvme: pci: fix & improve timeout handling
Keith Busch
keith.busch at linux.intel.com
Tue May 8 08:09:33 PDT 2018
On Sat, May 05, 2018 at 07:51:22PM -0400, Laurence Oberman wrote:
> 3rd and 4th attempts slightly better, but clearly not dependable
>
> [root at segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O) [failed]
> runtime ... 81.188s
> --- tests/block/011.out 2018-05-05 18:01:14.268414752 -0400
> +++ results/nvme0n1/block/011.out.bad 2018-05-05
> 19:44:48.848568687 -0400
> @@ -1,2 +1,3 @@
> Running block/011
> +tests/block/011: line 47: echo: write error: Input/output error
> Test complete
>
> This one passed
> [root at segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O) [passed]
> runtime 81.188s ... 43.400s
>
> I will capture a vmcore next time it panics and give some information
> after analyzing the core
We definitely should never panic, but I am not sure this blktest can be
reliable on IO errors: the test is disabling memory space enabling and
bus master without the driver's knowledge, and it does this repeatedly
in a tight loop. If the test happens to disable the device while the
driver is trying to recover from the previous iteration, the recovery
will surely fail, so I think IO errors may possibly be expected.
As far as I can tell, the only way you'll actually get it to succeed is
if the test's subsequent "enable" happen's to hit in conjuction with the
driver's reset pci_enable_device_mem(), such that the pci_dev's enable_cnt
is > 1, which prevents the disabling for the remainder of the test's
looping.
I still think this is a very good test, but we might be able to make it
more deterministic on what actually happens to the pci device.
More information about the Linux-nvme
mailing list