[PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()

Wed Oct 5 15:49:17 PDT 2016

On Thu, Oct 6, 2016 at 5:08 AM, Bart Van Assche
<bart.vanassche at sandisk.com> wrote:
> On 10/05/2016 12:11 PM, Sagi Grimberg wrote:
>>
>> I was referring to weather we can take srcu in the submission path
>> conditional of the hctx being STOPPED?
>
>
> Hello Sagi,
>
> Regarding run-time overhead:
> * rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is
>   translated into preempt_disable() with preemption enabled. The latter
>   function modifies a per-cpu variable.
> * Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only
>   safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the
>   compiler is told to reread the hctx flags (READ_ONCE()) and if the
>   compiler and CPU are told not to reorder test_bit() with the
>   memory accesses in (s)rcu_read_lock(). To avoid races
>   BLK_MQ_S_STOPPED will have to be tested a second time after the lock
>   has been obtained, similar to the double-checked-locking pattern.
> * srcu_read_lock() reads a word from the srcu structure, disables
>   preemption, calls __srcu_read_lock() and re-enables preemption. The
>   latter function increments two CPU-local variables and triggers a
>   memory barrier (smp_mp()).

We can use srcu read lock for BLOCKING and rcu read lock for non-BLOCKING,
by putting *_read_lock() and *_read_unlock() into two wrappers, which
should minimize the cost of srcu read lock & unlock and the code is still easy
to read & verify.

>
> Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make the
> code more complicated. Going back to the implementation that calls
> rcu_read_lock() if .queue_rq() won't sleep will result in an implementation
> that is easier to read and to verify.

Yeah, I agree.

Thanks,
Ming Lei