[PATCH v3 2/9] nvme-fabrics: allow to queue requests for live queues

Sagi Grimberg sagi at grimberg.me
Thu Aug 20 12:58:52 EDT 2020


>> Right now we are failing requests based on the controller state (which
>> is checked inline in nvmf_check_ready) however we should definitely
>> accept requests if the queue is live.
>>
>> When entering controller reset, we transition the controller into
>> NVME_CTRL_RESETTING, and then return BLK_STS_RESOURCE for non-mpath
>> requests (have blk_noretry_request set).
>>
>> This is also the case for NVME_REQ_USER for the wrong reason. There
>> shouldn't be any reason for us to reject this I/O in a controller reset.
>> We do want to prevent passthru commands on the admin queue because we
>> need the controller to fully initialize first before we let user passthru
>> admin commands to be issued.
>>
>> In a non-mpath setup, this means that the requests will simply be
>> requeued over and over forever not allowing the q_usage_counter to drop
>> its final reference, causing controller reset to hang if running
>> concurrently with heavy I/O.
> 
> I'm still rather bothered with the admin queue exception.  And given that
> the q_usage_counter problem should only really be an issue for file system
> requests, as passthrough requests do not automatically get retried why
> can't we just reject all user command to be symetric and straight forward?
> The callers in userspace need to be able to cope with retryable errors
> anyway.

Looking at the code again, I think we can kill it as well.

The concern is we may issue user generated admin commands before
the controller is enabled (generating an unforced error just because
we queued to early). That used to be the case when the admin connect
used the admin_q which meant we needed to unquiesce before, but
now that the admin connect uses the fabrics_q, that should no
longer be an issue.

in nvme-tcp we unquiesce after we enable the ctrl:
--
         error = nvme_tcp_start_queue(ctrl, 0);
         if (error)
                 goto out_cleanup_queue;

         error = nvme_enable_ctrl(ctrl);
         if (error)
                 goto out_stop_queue;

         blk_mq_unquiesce_queue(ctrl->admin_q);
--

Also in nvme-rdma:
--
         error = nvme_rdma_start_queue(ctrl, 0);
         if (error)
                 goto out_cleanup_queue;

         error = nvme_enable_ctrl(&ctrl->ctrl);
         if (error)
                 goto out_stop_queue;

         ctrl->ctrl.max_segments = ctrl->max_fr_pages;
         ctrl->ctrl.max_hw_sectors = ctrl->max_fr_pages << (ilog2(SZ_4K) 
- 9);
         if (pi_capable)
                 ctrl->ctrl.max_integrity_segments = ctrl->max_fr_pages;
         else
                 ctrl->ctrl.max_integrity_segments = 0;

         blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
--

And also in nvme-fc:
--
         ret = nvmf_connect_admin_queue(&ctrl->ctrl);
         if (ret)
                 goto out_disconnect_admin_queue;

         set_bit(NVME_FC_Q_LIVE, &ctrl->queues[0].flags);

         /*
          * Check controller capabilities
          *
          * todo:- add code to check if ctrl attributes changed from
          * prior connection values
          */

         ret = nvme_enable_ctrl(&ctrl->ctrl);
         if (ret)
                 goto out_disconnect_admin_queue;

         ctrl->ctrl.max_segments = ctrl->lport->ops->max_sgl_segments;
         ctrl->ctrl.max_hw_sectors = ctrl->ctrl.max_segments <<
                                                 (ilog2(SZ_4K) - 9);

         blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
--

James, can you please have a look if this is still an issue?

>>   	/*
>> +	 * currently we have a problem sending passthru commands
>> +	 * on the admin_q if the controller is not LIVE because we can't
>> +	 * make sure that they are going out after the admin connect,
>> +	 * controller enable and/or other commands in the initialization
>> +	 * sequence. until the controller will be LIVE, fail with
>> +	 * BLK_STS_RESOURCE so that they will be rescheduled.
>>   	 */
> 
> Nit: please start multi-line comments with a capital letter.  Also I
> think some of the lines do not nearly use up the 80 characters available.

Will fix.



More information about the Linux-nvme mailing list