[BUG REPORT] I/O hangs in blk_execute_rq

Mon Sep 26 19:48:01 PDT 2016

Hi, all

Description of problem:
After reboot the controller of storage device, the kworker thread in server OS probatilistic becomes D state, hanging in calling blk_execute_rq. I find that the request queue‘s queue_flag is QUEUE_FLAG_STOPPED. __blk_run_queue  returns before calling __blk_run_queue_uncond , the IO request won’t be sended and there won’t be any response. So the blk_execute_rq hangs in wait_for_completion forever.

The stack:
#0 [ffff880034623660] schedule at ffffffff803f76f4
#1 [ffff880034623808] schedule_timeout at ffffffff803f7fad
#2 [ffff880034623898] wait_for_common at ffffffff803f6e95
#3 [ffff880034623928] blk_execute_rq at ffffffff801faf58
#4 [ffff880034623948] blk_rq_append_bio at ffffffff801fa840
#5 [ffff880034623968] blk_rq_map_kern at ffffffff801fa962
#6 [ffff8800346239c8] alua_vpd_inquiry at ffffffffa008cab5 [scsi_dh_alua]
#7 [ffff880034623a28] alua_initialize at ffffffffa008ce4e [scsi_dh_alua]
#8 [ffff880034623a48] alua_bus_attach at ffffffffa008d33e [scsi_dh_alua]
#9 [ffff880034623a68] scsi_dh_handler_attach at ffffffffa007223a [scsi_dh]
#10 [ffff880034623a88] notifier_call_chain at ffffffff803fde47
#11 [ffff880034623ab8] __blocking_notifier_call_chain at ffffffff8006e68b
#12 [ffff880034623af8] device_add at ffffffff802cdee2
#13 [ffff880034623b38] scsi_sysfs_add_sdev at ffffffffa000f8a1 [scsi_mod]
#14 [ffff880034623b68] scsi_add_lun at ffffffffa000c888 [scsi_mod]
#15 [ffff880034623be8] scsi_probe_and_add_lun at ffffffffa000cd89 [scsi_mod]
#16 [ffff880034623ca8] scsi_report_lun_scan at ffffffffa000d326 [scsi_mod]
#17 [ffff880034623d78] __scsi_scan_target at ffffffffa000dab6 [scsi_mod]
#18 [ffff880034623dc8] scsi_scan_target at ffffffffa000e0b1 [scsi_mod]
#19 [ffff880034623e08] fc_scsi_scan_rport at ffffffffa058b12a [scsi_transport_fc]
#20 [ffff880034623e28] process_one_work at ffffffff80060b78
#21 [ffff880034623e78] worker_thread at ffffffff8006452a
#22 [ffff880034623ee8] kthread at ffffffff80068126
#23 [ffff880034623f48] kernel_thread_helper at ffffffff80402894

I check the latest code,  this logic doesn’t change. I think we need add the check of stopped queue in blk_execute_rq_nowait， just like the existing checking of the dying queue. If blk_queue_stopped(q) is true, set the rq->errors to -EIO and return. This can prevent blk_execute_rq hanging when the queue_flag is QUEUE_FLAG_STOPPED

Anybody can help me see it, thanks.