[PATCH v3 5/7] nvme-fabrics: Add host support for FC transport
James Smart
james.smart at broadcom.com
Thu Oct 27 15:28:14 PDT 2016
On 10/25/2016 9:47 AM, Christoph Hellwig wrote:
>
> I've just send out two patches that get rid of the concept of a CQE
> in the core. I think all this code would benefit a lot from being
> rebase on top of that.
Ok. Done. Didn't change things that much. Mainly eliminated some
memsets/memcpy's. But is better.
>
>> +static void
>> +nvme_fc_kill_request(struct request *req, void *data, bool reserved)
>> +{
>> + struct nvme_ctrl *nctrl = data;
>> + struct nvme_fc_ctrl *ctrl = to_fc_ctrl(nctrl);
>> + struct nvme_fc_fcp_op *op = blk_mq_rq_to_pdu(req);
>> + int status;
>> +
>> + if (!blk_mq_request_started(req))
>> + return;
>> +
>> + status = __nvme_fc_abort_op(ctrl, op);
>> + if (status) {
>> + /* io wasn't active to abort consider it done */
>> + /* assume completion path already completing in parallel */
>> + return;
>> + }
>> +}
> This looks odd - the point of the blk_mq_tagset_busy_iter loops
> is return the in-core request to the caller. The controller shutdown
> following this would take care of invalidating anything on the wire.
>
> I have to admit that I'm still a bit confused how the FC abort
> fits into the whole NVMe architecture model.
Well, I used blk_mq_tagset_busy_iter a bit differently than I saw it
used on rdma.
With RDMA, I saw it using the routine to call the core
nvme_cancel_request, which indeed passed the request back up to the blk
layer. It then scheduled the reconnect_ctrl_work routine, which would
terminate the queues, tearing down the qp's. I assume the teardown of
the qp's also terminated all the memory regions that may have been
mapped for ios sent over the qps. I also assume the tgt side, upon the
qp termination, also tears down and resolves any io state relative to
the qp, io-related mrs for the qp, etc. Note: I'm actually surprised.
I would have assumed it's not good practice to return things to the blk
layer while there may be things coming off the wire that hit memory
registrations and perhaps muck with memory. I assume either teardown is
too quick or they aren't destructive accesses (memory related to the
request's sgl stays around long enough that the cancelations finish
before the request releases it, etc).
In FC, the queue is a logical thing, unlike rdma where it maps to the
qp. At transport connect, the target gives the initiator a handle for
the queue that the initiator is to pair on any io it sends to the
target. When an SQE is sent on a SQ, FC effectively considers the SQE,
or perhaps better said, the command within the SQE, an "io", and assigns
a FC exchange to it. The SQE, and the queue handle, get sent in the
initial CMD IU on the exchange. Everything relative to the io happens
under the exchange context. The CQE is the last thing for the io, and
the transport closes the exchange upon the commands-CQE being delivered.
So - if I'm tearing down queues/resetting controllers, not only do I
send a DISCONNECT LS to kill the validity of the handle, but I also need
to do something to cancel all those FC exchange contexts that
corresponded to ios that were outstanding on the queue.
Given those exchange contexts are tied to LLDD contexts that also have
dma resources mapped to them - it isn't correct for me to just return
the requests back to the request queue. I want to cancel them in the
LLDD, stopping any potential dma accessess and giving the resources back
to the LLDD, then pass the request back to the blk layer. So, I
replaced the blk_mq_tagset_busy_iter() with a local routine that kicks
off the callbacks into the LLDD to cancel the jobs. From there, the
LLDD will "finish" the jobs in a normal manner, getting them back to the
blk layer.
As for aborts relative to individual io timeouts, the "normal thing" we
did with scsi and other drive protocols, it really doesn't fit and the
nvme escalation policy of failing the queue or resetting the controller
should kick in (like on rdma). The FC transport could kill the one I/O,
but as the SQE and CQEs are part of the io - there's queue-related
things it messes with when it does so that aren't good. Granted linux
ignores some of those making it tempting, but I don't want to go there.
If anything were to be done - it should be something generic: have a
core layer time out function that then generates an ABORT admin command
to kill it. And if that fails or timesout, then it resorts to a
transport controller reset routine.
hope that helps
-- james
More information about the Linux-nvme
mailing list