[PATCH 2/2] nvmet: Fix fatal_err_work deadlock
Sagi Grimberg
sagi at grimberg.me
Mon Oct 2 15:45:10 PDT 2017
>> fatal error handler was taking the assumption that that delete_ctrl
>> execution is asynchronous given that controller teardown is refcounted
>> by queues that are refcounted by inflight IO. This suggests that
>> controller actual free is async by nature, probably should have
>> documented it...
>>
>> Is fc's delete_ctrl blocks until all inflight IO is drained? I would
>> suggest to defer this blocking routine out of the fatal_error path like
>> rdma and loop. Is that something that breaks your design?
>
> No - it really doesn't block waiting (like the host side) although it
> may appear that way. Real difference is it processes the teardown in its
> entirety and its possible, especially on light/idle load, the ref
> counting could cause things to occur in the delete_ctrl context. Whereas
> rdma and loop definitely convert over to another workq context for
> teardown. Yes, I can do that too. Yes, if there are requirements like
> this for a transport - please add comments/documentation. Although, as
> you can see by this proposed patch, an implementation can be made in the
> core that places no requirement on a transport.
I assume that we can change that to give some flexibility to the
transport implementation, so:
Acked-by: Sagi Grimberg <sagi at grimberg.me>
More information about the Linux-nvme
mailing list