nvmf/rdma host crash during heavy load and keep alive recovery
Steve Wise
swise at opengridcomputing.com
Thu Aug 11 06:58:38 PDT 2016
> >> The nvme_rdma_ctrl queue associated with the request is in RECONNECTING
> state:
> >>
> >> ctrl = {
> >> state = NVME_CTRL_RECONNECTING,
> >> lock = {
> >>
> >> So it should not be posting SQ WRs...
> >
> > kato kicks error recovery, nvme_rdma_error_recovery_work(), which calls
> > nvme_cancel_request() on each request. nvme_cancel_request() sets
req->errors
> > to NVME_SC_ABORT_REQ. It then completes the request which ends up at
> > nvme_rdma_complete_rq() which queues it for retry:
> > ...
> > if (unlikely(rq->errors)) {
> > if (nvme_req_needs_retry(rq, rq->errors)) {
> > nvme_requeue_req(rq);
> > return;
> > }
> >
> > if (rq->cmd_type == REQ_TYPE_DRV_PRIV)
> > error = rq->errors;
> > else
> > error = nvme_error_status(rq->errors);
> > }
> > ...
> >
> > The retry will end up at nvme_rdma_queue_rq() which will try and post a send
wr
> > to a freed qp...
> >
> > Should the canceled requests actually OR in bit NVME_SC_DNR? That is only
> done
> > in nvme_cancel_request() if the blk queue is dying:
>
> the DNR bit should not be set normally, only when we either don't want
> to requeue or we can't.
>
> >
> > ...
> > status = NVME_SC_ABORT_REQ;
> > if (blk_queue_dying(req->q))
> > status |= NVME_SC_DNR;
> > ...
> >
>
> couple of questions:
>
> 1. bringing down the interface means generating DEVICE_REMOVAL
> event?
>
No. Just ifconfig ethX down; sleep 10; ifconfig ethX up. This simply causes
the pending work requests to take longer to complete and kicks in the kato
logic.
> 2. keep-alive timeout expires means that nvme_rdma_timeout() invokes
> kicks error_recovery and set:
> rq->errors = NVME_SC_ABORT_REQ | NVME_SC_DNR
>
> So I'm not at all convinced that the keep-alive is the request that
> being re-issued. Did you verify that?
The request that caused the crash had rq->errors == NVME_SC_ABORT_REQ. I'm not
sure that is always the case though. But this is very easy to reproduce, so I
should be able to drill down and add any debug code you think might help.
More information about the Linux-nvme
mailing list