[PATCHv2 2/2] nvme: Complete all stuck requests

Mon Feb 27 23:42:19 PST 2017

On 02/27/2017 08:15 PM, Keith Busch wrote:
> On Mon, Feb 27, 2017 at 07:27:51PM +0200, Sagi Grimberg wrote:
>> OK, I think we can get it for fabrics too, need to figure out how to
>> handle it there too.
>>
>> Do you have a reproducer?
> 
> To repro, I have to run a buffered writer workload then put the system into S3.
> 
> This fio job seems to reproduce for me:
> 
>   fio --name=global --filename=/dev/nvme0n1 --bsrange=4k-128k --rw=randwrite --ioengine=libaio --iodepth=8 --numjobs=8 --name=foobar
> 
> I use rtcwake to test suspend/resume:
> 
>   rtcwake -m mem -s 10
> 
> Without the patch we'll get stuck after "Disabling non-boot CPUs ..."
> when blk-mq waits to freeze some entered queues after nvme was disabled.

I'm observing the same thing when hibernating during mdraid resync on
nvme - it hangs in blk_mq_freeze_queue_wait() after "Disabling non-boot
CPUs ...". This patch did not help but when I put nvme_wait_freeze()
right after nvme_start_freeze() it appeared to be working. Maybe the
difference here is that requests are submitted from a non-freezable
kernel thread (md sync_thread)?

Thanks,
Artur