[PATCH rfc 0/1] Fix missing AENs when discovery controllers are disconnected

James Smart jsmart2021 at gmail.com
Wed Sep 14 08:00:24 PDT 2022


On 9/14/2022 3:29 AM, Sagi Grimberg wrote:
> 
>> I was thinking this rather than the new option... The bit would only 
>> be set *after* the first successful link-side connect, thus the 
>> initial nvme_start_ctrl would not send the reconnect event. Every 
>> reconnect thereafter would.
>>
>> -- james
>>
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 2429b11eb9a8..43c8b6590164 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -4814,6 +4814,10 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl)
>>
>>       nvme_enable_aen(ctrl);
>>
>> +    if (nvme_discovery_ctrl(ctrl) &&
>> +        test_bit(NVME_CTRL_FABRIC_CONNECTED, &ctrl->flags))
>> +        nvme_change_uevent(ctrl, "NVME_EVENT=rediscover");
>> +
>>       if (ctrl->queue_count > 1) {
>>           nvme_queue_scan(ctrl);
>>           nvme_start_queues(ctrl);
>> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
>> index 127abaf9ba5d..ff1dd8f999b0 100644
>> --- a/drivers/nvme/host/fc.c
>> +++ b/drivers/nvme/host/fc.c
>> @@ -2947,6 +2947,7 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
>>           goto out_delete_hw_queues;
>>
>>       ctrl->ioq_live = true;
>> +    set_bit(NVME_CTRL_FABRIC_CONNECTED, &ctrl->ctrl.flags);
> 
> Isn't this set before calling nvme_start_ctrl()?

frick - sent the wrong tweak for fc:

this should be:
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 127abaf9ba5d..630d8e50f29c 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3201,6 +3201,8 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
         if (changed)
                 nvme_start_ctrl(&ctrl->ctrl);

+       set_bit(NVME_CTRL_FABRIC_CONNECTED, &ctrl->ctrl.flags);
+
         return 0;       /* Success */

  out_term_aen_ops:


So in all cases, it's now set immediately after calling nvme_start_ctrl. 
Won't be set on 1st call but will be all on all reamaing.


> 
> But this is orthogonal to the user argument, is there a specific
> reason to why not add an explicit argument requesting that?
> Because every userspace that will setup a persistent discovery
> controller will need this event?

Well my aversion is to remove relationships between user-space and 
kernel.  It seemed very odd to go to user-space to then set a custom 
option and this handshaking looked like a lot of work vs just generating 
a reconnect event on a discovery controller.

In my mind think we should be doing one of the following:
a) generate connect and reconnect events generically. If they are 
handled, great, if not that's fine. Handler is free to filter by device 
attributes as to what it does

I assume there's some issue here that you saw with the original attempt 
as reconnects may post while connects or reconnects are still being handled.

b) go down the path of the rediscover option - but the kernel routine 
should never require it to be a discovery controller. If the option is 
set, send the rediscover event. Removes events that won't be handled.
The filtering in the lib to only set the option on devices that the lib 
cares about can continue.

-- james



More information about the Linux-nvme mailing list