[PATCH 1/2] nvme-pci: Fix multiple races in nvme_setup_io_queues()

Tue Jun 22 21:14:21 PDT 2021

On 6/22/21 9:00 PM, Casey Chen wrote:
>
> On 6/22/21 6:53 PM, Keith Busch wrote:
>> On Tue, Jun 22, 2021 at 05:59:30PM -0700, Casey Chen wrote:
>>> On 6/22/21 8:06 AM, Keith Busch wrote:
>>>> On Mon, Jun 21, 2021 at 05:27:09PM -0700, Casey Chen wrote:
>>>>> +    /*
>>>>> +     * Free IRQ resources as soon as NVMEQ_ENABLED bit transitions
>>>>> +     * from set to unset. If there is a window to it is truely 
>>>>> freed,
>>>>> +     * pci_free_irq_vectors() jumping into this window will crash.
>>>>> +     * And take lock to avoid racing with pci_free_irq_vectors() in
>>>>> +     * nvme_dev_disable() path.
>>>>> +     */
>>>>> +    mutex_lock(&dev->shutdown_lock)
>>>> Sorry, I wasn't clear in previous review. All of the shutdown_locks
>>>> taken after initialization need to by mutex_trylock()'s. If you 
>>>> have to
>>>> wait to get the lock, the device is not going to be suitable for
>>>> continuing initialization after the lock is available.
>>>>
>>>> And looking at this again, if trylock is successful, I think you 
>>>> need to
>>>> verify controller state is suitable for continuing the initialization.
>>> I assume you mean all the mutex_lock() except the one in 
>>> nvme_dev_disable(),
>>> right ? I agree with replacing all mutex_lock() with mutex_trylock() 
>>> except
>>> the one in nvme_dev_disable().
>> I only mean the locks you are adding. The existing usages are correct
>> as-is.
>>
>>> But I doubt about checking controller state
>>> after lock being acquired. Two reasons:
>>>
>>> 1. What state can use to check ?
>> If the state is not NVME_CTRL_CONNECTING after successfully taking the
>> shutdown_lock, then the controller is not in a state that can proceed
>> with IO queue setup.
>>
>>> Suppose there is a state set on surprise
>>> removal, what if the state check happen before it is set while surprise
>>> removal is on its way ?
>>>
>>> 2. Even we don't check controller state after acquiring lock, the
>>> initialization still fail fairly soon if surprise removal happens. 
>>> We don't
>>> sacrifice any performance.
>> Nothing to do with performance; we're not in io path. The suggestion is
>> about avoiding access to a non-existent device. The timing of a surprise
>> removal can always defeat such a check, so my suggestion is not critical
>> since we expect that to be handled anyway, but in my experience,
>> platforms work better if the OS prevents such access when it knows it
>> will fail. It's not full-proof.
>>
>> But as I said, everything should work even without the check, so feel
>> free to omit it.
>
> Thanks for explanation. I prefer not having the check even I had came 
> up with a solution as below.
>
> This looks cumbersome. If you can make it better, I might adopt it. 
> Otherwise, I will send new patches without the check. :)
>
> +/*
> + * Try getting shutdown_lock while setting up IO queues.
> + * Caller remember to unlock.
> + */
> +static int nvme_setup_io_queues_trylock(struct nvme_dev *dev)
> +{
> +       /*
> +       * Lock is being held by nvme_dev_disable(), fail early.
> +       */
> +       if (!mutex_trylock(&dev->shutdown_lock))
> +               return -ENODEV;
> +
> +       /*
> +       * Controller is in wrong state, fail early.
> +        */
> +       if (dev->ctrl.state == NVME_CTRL_CONNECTING)
> +               return -ENODEV;
> +
> +       return 0;
> +}
> +
>
> Sample call:
>
> -       mutex_lock(&dev->shutdown_lock);
> +       if ((result = nvme_setup_io_queues_trylock(dev)))
> +               return result;
>

Sorry I sent in a rush, should be:

+/*
+ * Try getting shutdown_lock while setting up IO queues.
+ * Caller remember to unlock if success.
+ */
+static int nvme_setup_io_queues_trylock(struct nvme_dev *dev)
+{
+       /*
+       * Lock is being held by nvme_dev_disable(), fail early.
+       */
+       if (!mutex_trylock(&dev->shutdown_lock))
+               return -ENODEV;
+
+       /*
+       * Controller is in wrong state, fail early.
+        */
+       if (dev->ctrl.state != NVME_CTRL_CONNECTING) {
+               mutex_unlock(&dev->shutdown_lock);
+               return -ENODEV;
+       }
+
+       return 0;
+}
+