[PATCH V4 0/7] nvme: pci: fix & improve timeout handling

Ming Lei ming.lei at redhat.com
Wed May 9 19:09:27 PDT 2018


On Wed, May 09, 2018 at 01:46:09PM +0800, jianchao.wang wrote:
> Hi ming
> 
> I did some tests on my local.
> 
> [  598.828578] nvme nvme0: I/O 51 QID 4 timeout, disable controller
> 
> This should be a timeout on nvme_reset_dev->nvme_wait_freeze.
> 
> [  598.828743] nvme nvme0: EH 1: before shutdown
> [  599.013586] nvme nvme0: EH 1: after shutdown
> [  599.137197] nvme nvme0: EH 1: after recovery
> 
> The EH 1 have mark the state to LIVE
> 
> [  599.137241] nvme nvme0: failed to mark controller state 1
> 
> So the EH 0 failed to mark state to LIVE
> The card was removed.
> This should not be expected by nested EH.

Right.

> 
> [  599.137322] nvme nvme0: Removing after probe failure status: 0
> [  599.326539] nvme nvme0: EH 0: after recovery
> [  599.326760] nvme0n1: detected capacity change from 128035676160 to 0
> [  599.457208] nvme nvme0: failed to set APST feature (-19)
> 
> nvme_reset_dev should identify whether it is nested.

The above should be caused by race between updating controller state,
hope I can find some time in this week to investigate it further.

Also maybe we can change to remove controller until nested EH has
been tried enough times.

Thanks,
Ming



More information about the Linux-nvme mailing list