[PATCH] nvme: mark ctrl as DEAD if removing from error recovery

Sagi Grimberg sagi at grimberg.me
Mon Jul 10 02:27:31 PDT 2023


>>>>> I still want your patches for tcp/rdma that move the freeze.
>>>>> If you are not planning to send them, I swear I will :)
>>>>
>>>> Ming, can you please send the tcp/rdma patches that move the
>>>> freeze? As I said before, it addresses an existing issue with
>>>> requests unnecessarily blocked on a frozen queue instead of
>>>> failing over.
>>>
>>> Any chance to fix the current issue in one easy(backportable) way[1] first?
>>
>> There is, you suggested one. And I'm requesting you to send a patch for
>> it.
> 
> The patch is the one pointed by link [1], and it still can be applied on current
> linus tree.
> 
> https://lore.kernel.org/linux-nvme/20230629064818.2070586-1-ming.lei@redhat.com/

This is separate from what I am talking about.

>>> All previous discussions on delay freeze[2] are generic, which apply on all
>>> nvme drivers, not mention this error handling difference causes extra maintain
>>> burden. I still suggest to convert all drivers in same way, and will work
>>> along the approach[1] aiming for v6.6.
>>
>> But we obviously hit a difference in expectations from different
>> drivers. In tcp/rdma there is currently an _existing_ bug, where
>> we freeze the queue on error recovery, and unfreeze only after we
>> reconnect. In the meantime, requests can be blocked on the frozen
>> request queue and not failover like they should.
>>
>> In fabrics the delta between error recovery and reconnect can (and
>> often will be) minutes or more. Hence I request that we solve _this_
>> issue which is addressed by moving the freeze to the reconnect path.
>>
>> I personally think that pci should do this as well, and at least
>> dual-ported multipath pci devices would prefer instant failover
>> than after a full reset cycle. But Keith disagrees and I am not going to
>> push for it.
>>
>> Regardless of anything we do in pci, the tcp/rdma transport
>> freeze-blocking-failover _must_ be addressed.
> 
> It is one generic issue, freeze/unfreeze has to be paired strictly
> for every driver.
> 
> For any nvme driver, the inbalance can happen when error handling
> is involved, that is why I suggest to fix the issue in one generic
> way.

Ming, you are ignoring what I'm saying. I don't care if the
freeze/unfreeze is 100% balanced or not (for the sake of this
discussion).

I'm talking about a _separate_ issue where a queue
is frozen for potentially many minutes blocking requests that
could otherwise failover.

>> So can you please submit a patch for each? Please phrase it as what
>> it is, a bug fix, so stable kernels can pick it up. And try to keep
>> it isolated to _only_ the freeze change so that it is easily
>> backportable.
> 
> The patch of "[PATCH V2] nvme: mark ctrl as DEAD if removing from error
> recovery" can fix them all(include nvme tcp/fc's issue), and can be backported.

Ming, this is completely separate from what I'm talking about. This one
is addressing when the controller is removed, while I'm talking about
the error-recovery and failover, which is ages before the controller is
removed.

> But as we discussed, we still want to call freeze/unfreeze in pair, and
> I also suggest the following approach[2], which isn't good to backport:
> 
> 	1) moving freeze into reset
> 	
> 	2) during resetting
> 	
> 	- freeze NS queues
> 	- unquiesce NS queues
> 	- nvme_wait_freeze()
> 	- update_nr_hw_queues
> 	- unfreeze NS queues
> 	
> 	3) meantime changes driver's ->queue_rq() in case that ctrl state is NVME_CTRL_CONNECTING,
> 	
> 	- if the request is FS IO with data, re-submit all bios of this request, and free the request
> 	
> 	- otherwise, fail the request
> 
> 
> [2] https://lore.kernel.org/linux-block/5bddeeb5-39d2-7cec-70ac-e3c623a8fca6@grimberg.me/T/#mfc96266b63eec3e4154f6843be72e5186a4055dc

Ming, please read again what my concern is. I'm talking about error 
recovery freezing a queue, and unfreezing only after we reconnect,
blocking requests that should failover.

The controller removal when the request queue is frozen is a separate
issue, which we should address, but separately.

What I am requesting is that you send two patches, one for tcp and
one for rdma that are identical to what you did in:
[PATCH 2/2] nvme: don't freeze/unfreeze queues from different contexts

rdma.c patch:
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 0eb79696fb73..354cce8853c1 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -918,6 +918,7 @@ static int nvme_rdma_configure_io_queues(struct 
nvme_rdma_ctrl *ctrl, bool new)
  		goto out_cleanup_tagset;

  	if (!new) {
+		nvme_start_freeze(&ctrl->ctrl);
  		nvme_unquiesce_io_queues(&ctrl->ctrl);
  		if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
  			/*
@@ -926,6 +927,7 @@ static int nvme_rdma_configure_io_queues(struct 
nvme_rdma_ctrl *ctrl, bool new)
  			 * to be safe.
  			 */
  			ret = -ENODEV;
+			nvme_unfreeze(&ctrl->ctrl);
  			goto out_wait_freeze_timed_out;
  		}
  		blk_mq_update_nr_hw_queues(ctrl->ctrl.tagset,
@@ -975,7 +977,6 @@ static void nvme_rdma_teardown_io_queues(struct 
nvme_rdma_ctrl *ctrl,
  		bool remove)
  {
  	if (ctrl->ctrl.queue_count > 1) {
-		nvme_start_freeze(&ctrl->ctrl);
  		nvme_quiesce_io_queues(&ctrl->ctrl);
  		nvme_sync_io_queues(&ctrl->ctrl);
  		nvme_rdma_stop_io_queues(ctrl);
--

tcp.c patch:
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index bf0230442d57..5ae08e9cb16d 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1909,6 +1909,7 @@ static int nvme_tcp_configure_io_queues(struct 
nvme_ctrl *ctrl, bool new)
  		goto out_cleanup_connect_q;

  	if (!new) {
+		nvme_start_freeze(ctrl);
  		nvme_unquiesce_io_queues(ctrl);
  		if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
  			/*
@@ -1917,6 +1918,7 @@ static int nvme_tcp_configure_io_queues(struct 
nvme_ctrl *ctrl, bool new)
  			 * to be safe.
  			 */
  			ret = -ENODEV;
+			nvme_unfreeze(ctrl);
  			goto out_wait_freeze_timed_out;
  		}
  		blk_mq_update_nr_hw_queues(ctrl->tagset,
@@ -2021,7 +2023,6 @@ static void nvme_tcp_teardown_io_queues(struct 
nvme_ctrl *ctrl,
  	if (ctrl->queue_count <= 1)
  		return;
  	nvme_quiesce_admin_queue(ctrl);
-	nvme_start_freeze(ctrl);
  	nvme_quiesce_io_queues(ctrl);
  	nvme_sync_io_queues(ctrl);
  	nvme_tcp_stop_io_queues(ctrl);
--

Nothing more.



More information about the Linux-nvme mailing list