[PATCH 8/9] nvme-pci: break up nvme_timeout and nvme_dev_disable

jianchao.wang jianchao.w.wang at oracle.com
Sun Feb 11 23:51:48 PST 2018


Hi Sagi

Just make some supplement here.

On 02/12/2018 10:16 AM, jianchao.wang wrote:
>> I think this is going in the wrong direction. Every state that is needed
>> to handle serialization should be done in core ctrl state. Moreover,
>> please try to avoid handling this locally in nvme-pci, place common
>> helpers in nvme-core and use them. I won't be surprised if fc would
>> need some of these.
>>
>> Also, can we try and not disable the controller from nvme_timeout?
> In fact, that is what this patch what to do. For the previous outstanding requests,
> this patch return BLK_EH_NOT_HANDLED and defer the work to nvme_dev_disable.
> 
>  I'm
>> not sure I understand why is this needed at all. What's wrong with
>> scheduling a controller reset/delete? Why is something like
>> nvme_pci_disable_ctrl_directly needed?
> Keith used to point out to me that, we cannot complete and free a request
> before we close the controller and pci master bus, otherwise, there will
> be somethings wrong in DMA accessing, because when we complete a request, 
> the associated DMA mapping will be freed.
> 
> For the previous outstanding requests, this patch could grab them with blk_abort_request
> in nvme_dev_disable, and complete them after we disable/shutdown the controller.
> 
> But for the adminq requests in nvme_dev_disable and nvme_reset_work, we cannot do this.
> We cannot schedule another reset_work->nvme_dev_disable to do that, because we are in it.
> So I use this nvme_pci_disable_ctrl_directly which looks like very ugly, to disable the
> controller and then we could complete the request with failure to move progress forward.
> 
>> I'd like to see an effort to consolidate error handling paths rather
>> than enhancing the current skew...
> Yes, absolutely. That is also what I expect. :)
> 
> This patch has two aspects:
> 1. grab all the previous outstanding requests with blk_abort_request.
>    It is safe when race with the irq completion path. And then complete them
>    after we disable/shutdown the controller completely.
>    I think this part could be placed in nvme ctrl core.

The 'grab' here is to avoid the timeout path to be triggered during nvme_dev_disable.
This is important, because nvme_timeout may issue IOs on adminq or invoke nvme_pci_disable_ctrl_directly
which could race with nvme_dev_disable.

> 2. avoid nvme_timeout invoke nvme_dev_disable. this is the most tricky part.

And also, this will introduce some dangerous scenarios. I have reported some of them before.

>    as I shared above, we have to _disable_ the controller _before_ we compete the adminq request
>    from the nvme_dev_disable and nvme_reset_work. Consequently, we cannot do as the
>    nvme_rdma_timeout, schedule a recovery work and then return. 

Actually, the nvme_timeout have a similar pattern with nvme_rdma_timeout.
When adminq/IOq request timeout, we can schedule reset_work for it. But if the requests from the reset_work
procedure timeout, we cannot schedule reset_work any more. At the moment, we have to close the controller
directly and fail the requests.

Looking forward some advice on this.
That's really appreciated.

Thanks
Jianchao







More information about the Linux-nvme mailing list