[PATCH] nvme: Revert: Fix controller creation races with teardown flow

Sagi Grimberg sagi at grimberg.me
Fri Aug 28 19:59:45 EDT 2020


>> This is indeed a regression.
>>
>> Perhaps we should also revert:
>> 12a0b6622107 ("nvme: don't hold nvmf_transports_rwsem for more than 
>> transport lookups")
>>
>> Which inherently caused this by removing the serialization of
>> .create_ctrl()...
> 
> no, I believe the patch on the semaphore is correct. Otherwise - things 
> can be blocked a long time.. a minute (1 cmd timeout) or even multiple 
> minutes in the case where a command failure in core layers effectively 
> gets ignored and thus doesn't cause the error path in the transport. 
> There can be multiple /dev/nvme-fabrics commands stacked up that can 
> make the delays look much longer to the last guy.
> 
> as far as creation vs teardown... yeah, not fun, but there are other 
> ways to deal with it. FC: I got rid of the separate create/reconnect 
> threads a while ago thus the return-control-while-reconnecting behavior, 
> so I've had to deal with it.  It's one area it'd be nice to see some 
> convergence in implementation again between transports.

Doesn't fc have a bug there? in create_ctrl after flushing the
connect_work, what is telling it if delete is running in with it
(or that it already ran...)



More information about the Linux-nvme mailing list