[PATCH 1/2] nvme-pci: Fix multiple races in nvme_setup_io_queues()

Tue Jun 22 21:00:08 PDT 2021

On 6/22/21 6:53 PM, Keith Busch wrote:
> On Tue, Jun 22, 2021 at 05:59:30PM -0700, Casey Chen wrote:
>> On 6/22/21 8:06 AM, Keith Busch wrote:
>>> On Mon, Jun 21, 2021 at 05:27:09PM -0700, Casey Chen wrote:
>>>> +	/*
>>>> +	 * Free IRQ resources as soon as NVMEQ_ENABLED bit transitions
>>>> +	 * from set to unset. If there is a window to it is truely freed,
>>>> +	 * pci_free_irq_vectors() jumping into this window will crash.
>>>> +	 * And take lock to avoid racing with pci_free_irq_vectors() in
>>>> +	 * nvme_dev_disable() path.
>>>> +	 */
>>>> +	mutex_lock(&dev->shutdown_lock)
>>> Sorry, I wasn't clear in previous review. All of the shutdown_locks
>>> taken after initialization need to by mutex_trylock()'s. If you have to
>>> wait to get the lock, the device is not going to be suitable for
>>> continuing initialization after the lock is available.
>>>
>>> And looking at this again, if trylock is successful, I think you need to
>>> verify controller state is suitable for continuing the initialization.
>> I assume you mean all the mutex_lock() except the one in nvme_dev_disable(),
>> right ? I agree with replacing all mutex_lock() with mutex_trylock() except
>> the one in nvme_dev_disable().
> I only mean the locks you are adding. The existing usages are correct
> as-is.
>
>> But I doubt about checking controller state
>> after lock being acquired. Two reasons:
>>
>> 1. What state can use to check ?
> If the state is not NVME_CTRL_CONNECTING after successfully taking the
> shutdown_lock, then the controller is not in a state that can proceed
> with IO queue setup.
>
>> Suppose there is a state set on surprise
>> removal, what if the state check happen before it is set while surprise
>> removal is on its way ?
>>
>> 2. Even we don't check controller state after acquiring lock, the
>> initialization still fail fairly soon if surprise removal happens. We don't
>> sacrifice any performance.
> Nothing to do with performance; we're not in io path. The suggestion is
> about avoiding access to a non-existent device. The timing of a surprise
> removal can always defeat such a check, so my suggestion is not critical
> since we expect that to be handled anyway, but in my experience,
> platforms work better if the OS prevents such access when it knows it
> will fail. It's not full-proof.
>
> But as I said, everything should work even without the check, so feel
> free to omit it.

Thanks for explanation. I prefer not having the check even I had came up 
with a solution as below.

This looks cumbersome. If you can make it better, I might adopt it. 
Otherwise, I will send new patches without the check. :)

+/*
+ * Try getting shutdown_lock while setting up IO queues.
+ * Caller remember to unlock.
+ */
+static int nvme_setup_io_queues_trylock(struct nvme_dev *dev)
+{
+       /*
+       * Lock is being held by nvme_dev_disable(), fail early.
+       */
+       if (!mutex_trylock(&dev->shutdown_lock))
+               return -ENODEV;
+
+       /*
+       * Controller is in wrong state, fail early.
+        */
+       if (dev->ctrl.state == NVME_CTRL_CONNECTING)
+               return -ENODEV;
+
+       return 0;
+}
+

Sample call:

-       mutex_lock(&dev->shutdown_lock);
+       if ((result = nvme_setup_io_queues_trylock(dev)))
+               return result;