[PATCH v2 1/2] nvme: switch to RCU freeing the namespace

Mon May 16 15:38:38 PDT 2016

On Sat, May 14, 2016 at 11:58 PM, Ming Lin <mlin at kernel.org> wrote:
> On Mon, 2016-04-25 at 14:20 -0700, Ming Lin wrote:
>>
>> @@ -1654,8 +1655,8 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
>>  {
>>       struct nvme_ns *ns;
>>
>> -     mutex_lock(&ctrl->namespaces_mutex);
>> -     list_for_each_entry(ns, &ctrl->namespaces, list) {
>> +     rcu_read_lock();
>> +     list_for_each_entry_rcu(ns, &ctrl->namespaces, list) {
>>               spin_lock_irq(ns->queue->queue_lock);
>>               queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
>>               spin_unlock_irq(ns->queue->queue_lock);
>> @@ -1663,7 +1664,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
>>               blk_mq_cancel_requeue_work(ns->queue);

Hi Keith,

I haven't found a way to fix below bug.
Could you help me to understand why blk_mq_cancel_requeue_work() here?

I know blk_mq_cancel_requeue_work() was introduced in:

commit c68ed59f534c318716c6189050af3c5ea03b8071
Author: Keith Busch <keith.busch at intel.com>
Date:   Wed Jan 7 18:55:44 2015 -0700

    blk-mq: Let drivers cancel requeue_work

    Kicking requeued requests will start h/w queues in a work_queue, which
    may alter the driver's requested state to temporarily stop them. This
    patch exports a method to cancel the q->requeue_work so a driver can be
    assured stopped h/w queues won't be started up before it is ready.

    Signed-off-by: Keith Busch <keith.busch at intel.com>
    Signed-off-by: Jens Axboe <axboe at fb.com>

Thanks,
Ming

>
> Blame myself.
>
> We hold RCU lock, but blk_mq_cancel_requeue_work() may sleep.
>
> So "echo 1 > /sys/class/nvme/nvme0/reset_controller" triggers below
> BUG.
>
> Thinking on the fix ...
>
> [ 2348.050146] BUG: sleeping function called from invalid context at /home/mlin/linux/kernel/workqueue.c:2783
> [ 2348.062044] in_atomic(): 0, irqs_disabled(): 0, pid: 1696, name: kworker/u16:0
> [ 2348.070810] 4 locks held by kworker/u16:0/1696:
> [ 2348.076900]  #0:  ("nvme"){++++.+}, at: [<ffffffff81088c87>] process_one_work+0x147/0x430
> [ 2348.086626]  #1:  ((&dev->reset_work)){+.+.+.}, at: [<ffffffff81088c87>] process_one_work+0x147/0x430
> [ 2348.097326]  #2:  (&dev->shutdown_lock){+.+...}, at: [<ffffffffc08cef2a>] nvme_dev_disable+0x4a/0x350 [nvme]
> [ 2348.108577]  #3:  (rcu_read_lock){......}, at: [<ffffffffc0813980>] nvme_stop_queues+0x0/0x1a0 [nvme_core]
> [ 2348.119620] CPU: 3 PID: 1696 Comm: kworker/u16:0 Tainted: G           OE   4.6.0-rc3+ #197
> [ 2348.129220] Hardware name: Dell Inc. OptiPlex 7010/0773VG, BIOS A12 01/10/2013
> [ 2348.137827] Workqueue: nvme nvme_reset_work [nvme]
> [ 2348.144012]  0000000000000000 ffff8800d94d3a48 ffffffff81379e4c ffff88011a639640
> [ 2348.152867]  ffffffff81a12688 ffff8800d94d3a70 ffffffff81094814 ffffffff81a12688
> [ 2348.161728]  0000000000000adf 0000000000000000 ffff8800d94d3a98 ffffffff81094904
> [ 2348.170584] Call Trace:
> [ 2348.174441]  [<ffffffff81379e4c>] dump_stack+0x85/0xc9
> [ 2348.181004]  [<ffffffff81094814>] ___might_sleep+0x144/0x1f0
> [ 2348.188065]  [<ffffffff81094904>] __might_sleep+0x44/0x80
> [ 2348.194863]  [<ffffffff81087b5e>] flush_work+0x6e/0x290
> [ 2348.201492]  [<ffffffff81087af0>] ? __queue_delayed_work+0x150/0x150
> [ 2348.209266]  [<ffffffff81126cf5>] ? irq_work_queue+0x75/0x90
> [ 2348.216335]  [<ffffffff810ca136>] ? wake_up_klogd+0x36/0x50
> [ 2348.223330]  [<ffffffff810b7fa6>] ? mark_held_locks+0x66/0x90
> [ 2348.230495]  [<ffffffff81088898>] ? __cancel_work_timer+0xf8/0x1c0
> [ 2348.238088]  [<ffffffff8108883b>] __cancel_work_timer+0x9b/0x1c0
> [ 2348.245496]  [<ffffffff810cadaa>] ? vprintk_default+0x1a/0x20
> [ 2348.252629]  [<ffffffff81142558>] ? printk+0x48/0x4a
> [ 2348.258984]  [<ffffffff8108896b>] cancel_work_sync+0xb/0x10
> [ 2348.265951]  [<ffffffff81350fb0>] blk_mq_cancel_requeue_work+0x10/0x20
> [ 2348.273868]  [<ffffffffc0813ae7>] nvme_stop_queues+0x167/0x1a0 [nvme_core]
> [ 2348.282132]  [<ffffffffc0813980>] ? nvme_kill_queues+0x190/0x190 [nvme_core]
> [ 2348.290568]  [<ffffffffc08cef51>] nvme_dev_disable+0x71/0x350 [nvme]
> [ 2348.298308]  [<ffffffff810b8f40>] ? __lock_acquire+0xa80/0x1ad0
> [ 2348.305614]  [<ffffffff810944b6>] ? finish_task_switch+0xa6/0x2c0
> [ 2348.313099]  [<ffffffffc08cffd4>] nvme_reset_work+0x214/0xd40 [nvme]
> [ 2348.320841]  [<ffffffff8176df17>] ? _raw_spin_unlock_irq+0x27/0x50
> [ 2348.328410]  [<ffffffff81088ce3>] process_one_work+0x1a3/0x430
> [ 2348.335633]  [<ffffffff81088c87>] ? process_one_work+0x147/0x430
> [ 2348.343030]  [<ffffffff810891d6>] worker_thread+0x266/0x4a0
> [ 2348.349986]  [<ffffffff8176871b>] ? __schedule+0x2fb/0x8d0
> [ 2348.356852]  [<ffffffff81088f70>] ? process_one_work+0x430/0x430
> [ 2348.364238]  [<ffffffff8108f529>] kthread+0xf9/0x110
> [ 2348.370581]  [<ffffffff8176e912>] ret_from_fork+0x22/0x50
> [ 2348.377344]  [<ffffffff8108f430>] ? kthread_create_on_node+0x230/0x230