[PATCH 0/4] nvme-blkmq fixes

Mon Dec 22 17:34:32 PST 2014

On Mon, 22 Dec 2014, Keith Busch wrote:
> On Mon, 22 Dec 2014, Jens Axboe wrote:
>> Should be enough to just check for ->rq_pool being initialized or not - if 
>> it is, we could have waiters and we know the waitqueues have been setup, 
>> etc.
>> 
>> V2 attached.
>
> Yep, that fixes the bug.
>
> I'm not sure I follow your suggestion for forcing bt_get() to abandon
> allocating a request tag when the queue is dying. If hctx_may_queue()
> fails, it returns a generic error and bt_get() reschedules itself. Should
> a different error than -1 be returned if the queue is dying?

We're making good incremental improvements, but finding oddities the
more I test this. This one's a doozy.

Requeued IO's are automatically dispatched, and I don't see an immediately
available way stop them. It causes a bug because the queue doorbells are
unmapped during reset, so you can't touch them when the queue should be
quiesced. I could fix that by having the driver not kick the requeue_list
when it knows a reset is in progress, but there's no immediate way
to drain the list if the reset fails and the device requires removal,
and blk_cleanup_queue() will be stuck.

Is there something available to call that I'm missing or do I need to
add more removal handling?