[PATCH rfc 0/1] Fix missing AENs when discovery controllers are disconnected
James Smart
jsmart2021 at gmail.com
Mon Sep 12 17:06:48 PDT 2022
On 9/12/2022 5:39 AM, Sagi Grimberg wrote:
>>> When a discovery controller is disconnected, no AENs will
>>> arrive to notify the host about discovery log-page changes.
>>>
>>> This attempts to fix it with a new connection parameter asking
>>> the kernel to send a dedicated udev event for this case. Prior
>>> attempt tried to use "connected" event already sent by the kernel
>>> however this also applied on the first connected, causing undesired
>>> side-effects when issuing a simple 'discover' command.
>>>
>>> The patchset includes the nvme-cli/libnvme counter-parts as well.
>>
>> Sagi,
>>
>> Do we really need another parameter/option ? seems awkward to need
>> userspace to figure it out.
>
> It has a rather specific use though..
>
> I was looking for something more fine-grained because I initially
> used the generic "connected" uevent which had some negative side
> effects, so I decided against adding yet another generic "reconnected"
> event that nvme-cli would use, and may/may-not fit other use-cases.
>
> I thought its better to have something that is as specific as possible
> like "send me an event only when a persistent discovery-controller
> reconnects" and I also used the name to indicate the semantic of
> the event, so in the future, a possible consumer of this event will
> know exactly what it is used for before opting to use it.
>
>> The transport certainly knows the difference between 1st time connect
>> and nth time re-connect - perhaps this should be a simple flag on the
>> nvme ctrl struct set by the transport ? Simple thing for rdma/tcp to
>> set flag at end of successful xxx_create_ctrl(), while FC, given it
>> does reconnects w/o a first successful connect needs a little more logic.
>
> I used ctrl->nr_reconnects to decide it, which introduced the constraint
> that transports would clear it _after_ calling nvme_start_ctrl(). Open
> to better suggestions though...
I was thinking this rather than the new option... The bit would only be
set *after* the first successful link-side connect, thus the initial
nvme_start_ctrl would not send the reconnect event. Every reconnect
thereafter would.
-- james
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2429b11eb9a8..43c8b6590164 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4814,6 +4814,10 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl)
nvme_enable_aen(ctrl);
+ if (nvme_discovery_ctrl(ctrl) &&
+ test_bit(NVME_CTRL_FABRIC_CONNECTED, &ctrl->flags))
+ nvme_change_uevent(ctrl, "NVME_EVENT=rediscover");
+
if (ctrl->queue_count > 1) {
nvme_queue_scan(ctrl);
nvme_start_queues(ctrl);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 127abaf9ba5d..ff1dd8f999b0 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2947,6 +2947,7 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
goto out_delete_hw_queues;
ctrl->ioq_live = true;
+ set_bit(NVME_CTRL_FABRIC_CONNECTED, &ctrl->ctrl.flags);
return 0;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bdc0ff7ed9ab..9b4260d60516 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -356,6 +356,7 @@ struct nvme_ctrl {
unsigned long flags;
#define NVME_CTRL_FAILFAST_EXPIRED 0
#define NVME_CTRL_ADMIN_Q_STOPPED 1
+#define NVME_CTRL_FABRIC_CONNECTED 2
struct nvmf_ctrl_options *opts;
More information about the Linux-nvme
mailing list