[PATCH v3 2/9] nvme-fabrics: allow to queue requests for live queues

Sagi Grimberg sagi at grimberg.me
Tue Aug 25 11:00:39 EDT 2020


>> I checked again, and regarding the comment:
>>
>> "I'm still rather bothered with the admin queue exception.  And given that
>> the q_usage_counter problem should only really be an issue for file system
>> requests, as passthrough requests do not automatically get retried why
>> can't we just reject all user command to be symetric and straight forward?
>> The callers in userspace need to be able to cope with retryable errors
>> anyway."
>>
>> blk_mq_alloc_request calls blk_queue_enter, which means that if we don't
>> let them in, controller reset can hang, just like in normal fs I/O.
> 
> Yes.  But the difference is that they don't get retrieѕ, but instead just
> fail.

Doesn't matter. The issue is that we get a user I/O request, enters
the q_usage_counter, goes into queue_rq, we check and we decide we want
to fail. then in nvmf_fail_nonready_command we return BLK_STS_RESOURCE,
and the request retried and failed again... Until its allowed to pass
through (when the controller is LIVE again).

If someone freeze the queue, it cannot because we have an entered 
request that will never complete.


>> So we either keep this exception for admin commands, or we also let
>> them through as it seems to be safe with the current code (from the
>> reset forward-progress perspective).
>>
>> I vote to remove this exception altogether...
> 
> To me the right answer would be to reject user commands on the admin
> or I/O queue for the not live controller as the callers need to handle
> it.  That seems to make more sense to me than a special admin queue
> exception.

So you say that we should never return BLK_STS_RESOURCE but rather fail
all requests? regardless for multipath or not?



More information about the Linux-nvme mailing list