[dm-devel] [PATCH v5] blk-mq: introduce BLK_STS_DEV_RESOURCE
Bart Van Assche
bart.vanassche at wdc.com
Tue Jan 30 09:52:31 PST 2018
On 01/30/18 06:24, Mike Snitzer wrote:
> + *
> + * If driver returns BLK_STS_RESOURCE and SCHED_RESTART
> + * bit is set, run queue after a delay to avoid IO stalls
> + * that could otherwise occur if the queue is idle.
> */
> - if (!blk_mq_sched_needs_restart(hctx) ||
> + needs_restart = blk_mq_sched_needs_restart(hctx);
> + if (!needs_restart ||
> (no_tag && list_empty_careful(&hctx->dispatch_wait.entry)))
> blk_mq_run_hw_queue(hctx, true);
> + else if (needs_restart && (ret == BLK_STS_RESOURCE))
> + blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY);
> }
If a request completes concurrently with execution of the above code
then the request completion will trigger a call of
blk_mq_sched_restart_hctx() and that call will clear the
BLK_MQ_S_SCHED_RESTART bit. If that bit is cleared before the above code
tests it then the above code will schedule an asynchronous call of
__blk_mq_run_hw_queue(). If the .queue_rq() call triggered by the new
queue run returns again BLK_STS_RESOURCE then the above code will be
executed again. In other words, a loop occurs. That loop will repeat as
long as the described race occurs. The current (kernel v4.15) block
layer behavior is simpler: only block drivers call
blk_mq_delay_run_hw_queue() and the block layer core never calls that
function. Hence that loop cannot occur with the v4.15 block layer core
and block drivers. A motivation of why that loop is preferred compared
to the current behavior (no loop) is missing. Does this mean that that
loop is a needless complication of the block layer core?
Sorry but I still prefer the v4.15 block layer approach because this
patch has in my view the following disadvantages:
- It involves a blk-mq API change. API changes are always risky and need
some time to stabilize.
- The delay after which to rerun the queue is moved from block layer
drivers into the block layer core. I think that's wrong because only
the block driver authors can make a good choice for this constant.
- This patch makes block drivers harder to understand. Anyone who sees
return BLK_STS_RESOURCE / return BLK_STS_DEV_RESOURCE for the first
time will have to look up the meaning of these constants. An explicit
blk_mq_delay_run_hw_queue() call is easier to understand.
- This patch makes the blk-mq core harder to understand because of the
loop mentioned above.
- This patch does not fix any bugs nor makes block drivers easier to
read or to implement. So why is this patch considered useful?
Thanks,
Bart.
More information about the Linux-nvme
mailing list