[RFC PATCH] nvme: fix RCU hole that allowed for endless looping in multipath round robin
John Meneghini
jmeneghi at redhat.com
Wed Mar 23 12:07:04 PDT 2022
This is good. I'm glad everyone agrees.
Chris, please ask our partner to test this patch in their testbed and verify that
this fixes the soft-lockup problem they are seeing. I believe they have the ability
to reproduce this problem on their 8.6 testbed.
If that works then please re-post a patch so people can consider this for v5.18.
/John
On 3/23/22 11:34, Christoph Hellwig wrote:
> On Wed, Mar 23, 2022 at 04:54:26PM +0200, Sagi Grimberg wrote:
>>
>>
>> On 3/22/22 00:43, Chris Leech wrote:
>>> Make nvme_ns_remove match the assumptions elsewhere.
>>>
>>> 1) !NVME_NS_READY needs to be srcu synchronized to make sure nothing is
>>> running in __nvme_find_path or nvme_round_robin_path that will
>>> re-assign this ns to current_path.
>>>
>>> 2) Any matching current_path entries need to be cleared before removing
>>> from the siblings list, to prevent calling nvme_round_robin_path with
>>> an "old" ns that's off list.
>>>
>>> 3) Finally the list_del_rcu can happen, and then synchronize again
>>> before releasing any reference counts.
>>> ---
>>> drivers/nvme/host/core.c | 13 +++++++++----
>>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index fd4720d37cc0..20778dc9224c 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -3917,6 +3917,15 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>>> set_capacity(ns->disk, 0);
>>> nvme_fault_inject_fini(&ns->fault_inject);
>>> + /* ensure that !NVME_NS_READY is seen
>>> + * to prevent this ns going back in current_path
>>> + */
>>> + synchronize_srcu(&ns->head->srcu);
>>> +
>>> + /* wait for concurrent submissions */
>>> + if (nvme_mpath_clear_current_path(ns))
>>> + synchronize_srcu(&ns->head->srcu);
>>
>> Nothing prevents it from being reselected again.
>> This is what drove the placement of this after the
>> ns is removed from the head->list. But that was before
>> the selection looked into NVME_NS_READY flag...
>>
>> This looks legit to me...
>
> Yes, this looks pretty sensible. I'm tempted to just queue it up
> for 5.18.
>
More information about the Linux-nvme
mailing list