[PATCH 2/2] nvmet: Fix fatal_err_work deadlock

Sagi Grimberg sagi at grimberg.me
Mon Oct 2 15:45:10 PDT 2017


>> fatal error handler was taking the assumption that that delete_ctrl
>> execution is asynchronous given that controller teardown is refcounted
>> by queues that are refcounted by inflight IO. This suggests that
>> controller actual free is async by nature, probably should have
>> documented it...
>>
>> Is fc's delete_ctrl blocks until all inflight IO is drained? I would
>> suggest to defer this blocking routine out of the fatal_error path like
>> rdma and loop. Is that something that breaks your design?
> 
> No - it really doesn't block waiting (like the host side) although it 
> may appear that way. Real difference is it processes the teardown in its 
> entirety and its possible, especially on light/idle load, the ref 
> counting could cause things to occur in the delete_ctrl context. Whereas 
> rdma and loop definitely convert over to another workq context for 
> teardown. Yes, I can do that too.  Yes, if there are requirements like 
> this for a transport - please add comments/documentation.  Although, as 
> you can see by this proposed patch, an implementation can be made in the 
> core that places no requirement on a transport.

I assume that we can change that to give some flexibility to the
transport implementation, so:

Acked-by: Sagi Grimberg <sagi at grimberg.me>



More information about the Linux-nvme mailing list