[PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()

Bart Van Assche bart.vanassche at sandisk.com
Wed Oct 5 14:08:57 PDT 2016


On 10/05/2016 12:11 PM, Sagi Grimberg wrote:
> I was referring to weather we can take srcu in the submission path
> conditional of the hctx being STOPPED?

Hello Sagi,

Regarding run-time overhead:
* rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is
   translated into preempt_disable() with preemption enabled. The latter
   function modifies a per-cpu variable.
* Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only
   safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the
   compiler is told to reread the hctx flags (READ_ONCE()) and if the
   compiler and CPU are told not to reorder test_bit() with the
   memory accesses in (s)rcu_read_lock(). To avoid races
   BLK_MQ_S_STOPPED will have to be tested a second time after the lock
   has been obtained, similar to the double-checked-locking pattern.
* srcu_read_lock() reads a word from the srcu structure, disables
   preemption, calls __srcu_read_lock() and re-enables preemption. The
   latter function increments two CPU-local variables and triggers a
   memory barrier (smp_mp()).

Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make 
the code more complicated. Going back to the implementation that calls 
rcu_read_lock() if .queue_rq() won't sleep will result in an 
implementation that is easier to read and to verify. If I overlooked 
something, please let me know.

Bart.





More information about the Linux-nvme mailing list