[PATCH 2/2] nvme: make keep-alive synchronous operation
Nilay Shroff
nilay at linux.ibm.com
Mon Oct 7 00:55:56 PDT 2024
On 10/7/24 12:11, Christoph Hellwig wrote:
> On Fri, Oct 04, 2024 at 05:16:57PM +0530, Nilay Shroff wrote:
>> The nvme keep-alive operation, which executes at a periodic interval,
>> could potentially sneak in while shutting down a fabric controller.
>> This may lead to a race between the fabric controller admin queue
>> destroy code path (while shutting down controller) and the blk-mq
>> hw/hctx queuing from the keep-alive thread.
>>
>> This fix helps avoid race by implementing keep-alive as a synchronous
>> operation so that admin queue-usage ref counter is decremented only
>> after keep-alive command finish execution and returns its status.
>
> With that you mean ->q_usage_counter?
Yes I meant ->q_usage_counter.
>
> Moving to synchronous submission and wasting a workqueue context for
> that is a bit sad. I think just removing the blk_mq_free_request call
> from nvme_keep_alive_finish and returning RQ_END_IO_FREE instead
> should have the same effect, or am I missing something?
>
Unfortunately, that would not help because we still fall through the same
code path and here we're just moving blk_mq_free_request call from
nvme_keep_alive_finish to its caller. For instance, assuming we keep
nvme_keep_alive work asynchronous then the code path for invoking
nvme_keep_alive_finish would be,
nvme_keep_alive_work()
->blk_execute_rq_no_wait()
->blk_mq_run_hw_queue()
->blk_mq_sched_dispatch_requests()
->__blk_mq_sched_dispatch_requests()
->blk_mq_dispatch_rq_list()
->nvme_loop_queue_rq()
->nvme_fail_nonready_command()
->nvme_complete_rq()
->nvme_end_req()
->blk_mq_end_request()
->__blk_mq_end_request()
->nvme_keep_alive_finish()
In the above call path, if nvme_keep_alive_finish returns RQ_END_IO_FREE (and we
remove blk_mq_free_request call from it) then the caller __blk_mq_end_request would
then invoke blk_mq_free_request and we may still hit the same bug because
this would allow blk_mq_destroy_queue() (which may be running on another cpu from
the controller shutdown code path) forward progress and next from blk_put_queue()
the admin queue resources are deleted.
>> Also, while we are at it, instead of first acquiring ctrl lock and then
>> accessing NVMe controller state, lets use the helper function
>> nvme_ctrl_state() in nvme_keep_alive_end_io() and get rid of the
>> lock.
>
> Please split that into a separate patch.
Sure I would split the patch and resend the series.
And thank you for your review comments!
--Nilay
More information about the Linux-nvme
mailing list