[PATCH 1/2] nvme-pci: Fix multiple races in nvme_setup_io_queues()
Casey Chen
cachen at purestorage.com
Tue Jun 22 21:14:21 PDT 2021
On 6/22/21 9:00 PM, Casey Chen wrote:
>
> On 6/22/21 6:53 PM, Keith Busch wrote:
>> On Tue, Jun 22, 2021 at 05:59:30PM -0700, Casey Chen wrote:
>>> On 6/22/21 8:06 AM, Keith Busch wrote:
>>>> On Mon, Jun 21, 2021 at 05:27:09PM -0700, Casey Chen wrote:
>>>>> + /*
>>>>> + * Free IRQ resources as soon as NVMEQ_ENABLED bit transitions
>>>>> + * from set to unset. If there is a window to it is truely
>>>>> freed,
>>>>> + * pci_free_irq_vectors() jumping into this window will crash.
>>>>> + * And take lock to avoid racing with pci_free_irq_vectors() in
>>>>> + * nvme_dev_disable() path.
>>>>> + */
>>>>> + mutex_lock(&dev->shutdown_lock)
>>>> Sorry, I wasn't clear in previous review. All of the shutdown_locks
>>>> taken after initialization need to by mutex_trylock()'s. If you
>>>> have to
>>>> wait to get the lock, the device is not going to be suitable for
>>>> continuing initialization after the lock is available.
>>>>
>>>> And looking at this again, if trylock is successful, I think you
>>>> need to
>>>> verify controller state is suitable for continuing the initialization.
>>> I assume you mean all the mutex_lock() except the one in
>>> nvme_dev_disable(),
>>> right ? I agree with replacing all mutex_lock() with mutex_trylock()
>>> except
>>> the one in nvme_dev_disable().
>> I only mean the locks you are adding. The existing usages are correct
>> as-is.
>>
>>> But I doubt about checking controller state
>>> after lock being acquired. Two reasons:
>>>
>>> 1. What state can use to check ?
>> If the state is not NVME_CTRL_CONNECTING after successfully taking the
>> shutdown_lock, then the controller is not in a state that can proceed
>> with IO queue setup.
>>
>>> Suppose there is a state set on surprise
>>> removal, what if the state check happen before it is set while surprise
>>> removal is on its way ?
>>>
>>> 2. Even we don't check controller state after acquiring lock, the
>>> initialization still fail fairly soon if surprise removal happens.
>>> We don't
>>> sacrifice any performance.
>> Nothing to do with performance; we're not in io path. The suggestion is
>> about avoiding access to a non-existent device. The timing of a surprise
>> removal can always defeat such a check, so my suggestion is not critical
>> since we expect that to be handled anyway, but in my experience,
>> platforms work better if the OS prevents such access when it knows it
>> will fail. It's not full-proof.
>>
>> But as I said, everything should work even without the check, so feel
>> free to omit it.
>
> Thanks for explanation. I prefer not having the check even I had came
> up with a solution as below.
>
> This looks cumbersome. If you can make it better, I might adopt it.
> Otherwise, I will send new patches without the check. :)
>
> +/*
> + * Try getting shutdown_lock while setting up IO queues.
> + * Caller remember to unlock.
> + */
> +static int nvme_setup_io_queues_trylock(struct nvme_dev *dev)
> +{
> + /*
> + * Lock is being held by nvme_dev_disable(), fail early.
> + */
> + if (!mutex_trylock(&dev->shutdown_lock))
> + return -ENODEV;
> +
> + /*
> + * Controller is in wrong state, fail early.
> + */
> + if (dev->ctrl.state == NVME_CTRL_CONNECTING)
> + return -ENODEV;
> +
> + return 0;
> +}
> +
>
> Sample call:
>
> - mutex_lock(&dev->shutdown_lock);
> + if ((result = nvme_setup_io_queues_trylock(dev)))
> + return result;
>
Sorry I sent in a rush, should be:
+/*
+ * Try getting shutdown_lock while setting up IO queues.
+ * Caller remember to unlock if success.
+ */
+static int nvme_setup_io_queues_trylock(struct nvme_dev *dev)
+{
+ /*
+ * Lock is being held by nvme_dev_disable(), fail early.
+ */
+ if (!mutex_trylock(&dev->shutdown_lock))
+ return -ENODEV;
+
+ /*
+ * Controller is in wrong state, fail early.
+ */
+ if (dev->ctrl.state != NVME_CTRL_CONNECTING) {
+ mutex_unlock(&dev->shutdown_lock);
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
More information about the Linux-nvme
mailing list