[PATCH 0/4] nvme-blkmq fixes

Tue Dec 30 18:31:39 PST 2014

Abandon the whole series...  Too many corner cases where this falls
to pieces. I'm running high queue-depth IO with random error injection
that causes requests to get on lists from ctx->rq_list, hctx->dispatch,
and q->requeue_list. No matter what I do from the driver, there is
always a case in either reset or removal where a requests get lost and
blk_cleanup_queue never completes.

On Tue, 23 Dec 2014, Jens Axboe wrote:
> On 12/23/2014 02:23 PM, Keith Busch wrote:
>> On Tue, 23 Dec 2014, Keith Busch wrote:
>>> @@ -432,7 +432,8 @@ static void req_completion(struct nvme_queue
>>> *nvmeq, void *ctx,
>>>         if (!(status & NVME_SC_DNR || blk_noretry_request(req))
>>>             && (jiffies - req->start_time) < req->timeout) {
>>>             blk_mq_requeue_request(req);
>>> -            blk_mq_kick_requeue_list(req->q);
>>> +            if (!blk_queue_stopped(req->q))
>>> +                blk_mq_kick_requeue_list(req->q);
>>>             return;
>>>         }
>>
>> Oops, experimenting with different things, took the wrong snapshot of
>> the patch. Should be:
>>
>> +            if (nvmeq->cq_vector != -1)
>>
>> rather than:
>>
>> +            if (!blk_queue_stopped(req->q))
>>
>> Anyway, I'm going to keep messing with this until I can either hit the
>> other failures I mentioned or convince myself it's safe before sending
>> something for official consideration.
>
> I was puzzled by the signed change for cq_vector by itself. I'll wait on
> more results from you before doing anything.