[PATCH] nvme-fabrics: allow to queue requests for live queues
Sagi Grimberg
sagi at grimberg.me
Tue Jul 28 19:39:26 EDT 2020
>>>> Can you please give an example? NVME_REQ_USERCMD should not be any
>>>> different from any other type of I/O.
>>> ...
>>
>> The problem I have with this code is that checking for NVME_REQ_USERCMD
>> is a big hammer that you land on what can be normal I/O
>> (e.g. nvme read/write/write-zeros), and we must not rely on this
>> indication to prevent what you are describing.
>>
>> It maybe (just maybe) OK to check NVME_REQ_USERCMD only for the admin
>> queue, but we are really circling over the fact that we cannot reliably
>> send admin connect before to go off first, because we have to unquiesce
>> it before we issue the admin connect, and there might be a stray command
>> going into execution before we submit the admin connect.
>
> agree.
>
> Didn't you suggest something separate to isolate the connect
> commands/init commands ? Can we use a different tag set for
> "initialization" commands? We can use different queue_rq routines with
> different check ready checks. I don't like the overhead, but sure makes
> the logic more straightforward.
I think I've added this now that I'm looking back:
e7832cb48a65 ("nvme: make fabrics command run on a separate request queue")
But it still doesn't solve the user commands you are referring to.
>>>>> As for the blk_rq_is_passthrough check - I guess I can see it being
>>>>> based on the queue state, and the check looks ok (we should never
>>>>> see !blk_rq_is_passthrough on the admin q).
>>>>> But...
>>>>> - I don't know why it was that important to change it. On the
>>>>> connecting path, all you're doing is letting io start flowing
>>>>> before all the queues have been created. Did you really need to
>>>>> start that much sooner ?
>>>>
>>>> The issue is that controller in RESETTING state will have requests that
>>>> are being issued, and if we don't let it pass through, it will hang
>>>> around forever being requeued preventing queue freeze to complete.
>>>
>
> Isn't this the real problem: we require the request to complete in
> order to get off the request queue in order to freeze. But to complete -
> it has to finish with some status. This is ok if running multipath, as
> mpath queues and retries - but what mpath is not running ? The issuer of
> the io will get the error ?
We already fail noretry errors when we cancel (or abort) anything that
was submitted before, so there is nothing new here. non-mpath
does not set FAILFAST so it gets requeued, user commands do, so they
are failed, this is not different than before.
> Should we always be putting even non-mp devices under mp, thus mp
> catches the errors rather than the blk queue, and can have a short delay
> while the controller reconnects ? At least normal io can be handled
> this way and we'd have to figure out the ioctl path. But it should
> still allow the core io, such as scanning, to get the error and fail
> out, to be picked up after the reconnect.
non-mpath still gets requeued because it doesn't set any sort of
FAILFAST flag.
or
More information about the Linux-nvme
mailing list