[PATCH v12 10/26] nvme-tcp: Deal with netdevice DOWN events

Thu Aug 17 07:09:45 PDT 2023

Sagi Grimberg <sagi at grimberg.me> writes:
>>>> +     switch (event) {
>>>> +     case NETDEV_GOING_DOWN:
>>>> +             mutex_lock(&nvme_tcp_ctrl_mutex);
>>>> +             list_for_each_entry(ctrl, &nvme_tcp_ctrl_list, list) {
>>>> +                     if (ndev == ctrl->offloading_netdev)
>>>> +                             nvme_tcp_error_recovery(&ctrl->ctrl);
>>>> +             }
>>>> +             mutex_unlock(&nvme_tcp_ctrl_mutex);
>>>> +             flush_workqueue(nvme_reset_wq);
>>>
>>> In what context is this called? because every time we flush a workqueue,
>>> lockdep finds another reason to complain about something...
>>
>> Thanks for highlighting this, we re-checked it and we found that we are
>> covered by nvme_tcp_error_recovery(), we can remove the
>> flush_workqueue() call above.
>
> Don't you need to flush at least err_work? How do you know that it
> completed and put all the references?

Our bad, we do need to wait for the netdev reference to be put, and we
must keep the flush_workqueue().

We did test with lockdep but did not notice any warnings.

As for the context of the event handler when you set the link down is
the process issuing the netlink syscall.

So if you run "ip link set X down" it would be (simplified):

"ip" -> syscall -> netlink api -> ... -> do_setlink -> call_netdevice_notifiers_info.