[PATCH V1] nvme-pci: disable SR-IOV VFs on driver unbind

qinyuntan qinyuntan at linux.alibaba.com
Thu Jan 29 20:53:25 PST 2026


Hi All,

Thank you all for the insightful discussion!

I agree with Leon's point that not all devices are created equal when it
comes to SR-IOV handling during driver unbind.

Looking at existing driver implementations, I found two different 
approaches:

1) mlx5 - unconditionally disables SR-IOV in remove:

    drivers/net/ethernet/mellanox/mlx5/core/main.c:
    static void remove_one(struct pci_dev *pdev)
    {
        ...
        mlx5_sriov_disable(pdev, false);
        ...
    }

    drivers/net/ethernet/mellanox/mlx5/core/sriov.c:
    void mlx5_sriov_disable(struct pci_dev *pdev, bool num_vf_change)
    {
        struct mlx5_core_dev *dev  = pci_get_drvdata(pdev);
        struct devlink *devlink = priv_to_devlink(dev);
        int num_vfs = pci_num_vf(dev->pdev);

        pci_disable_sriov(pdev);  /* Always disable, no 
pci_vfs_assigned() check */
        devl_lock(devlink);
        mlx5_device_disable_sriov(dev, num_vfs, true, num_vf_change);
        devl_unlock(devlink);
    }

2) ixgbe - checks pci_vfs_assigned() and skips disable if VFs are in use:

    drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:
    static void ixgbe_remove(struct pci_dev *pdev)
    {
        ...
    #ifdef CONFIG_PCI_IOV
        ixgbe_disable_sriov(adapter);
    #endif
        ...
    }

    drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c:
    #ifdef CONFIG_PCI_IOV
        if (pci_vfs_assigned(adapter->pdev)) {
            e_dev_warn("Unloading driver while VFs are assigned - VFs 
will not be deallocated\n");
            return -EPERM;
        }
        pci_disable_sriov(adapter->pdev);
    #endif

Regarding the warning level discussion: I would prefer keeping it as
dev_warn() rather than downgrading to dev_info(). As Leon mentioned,
some devices do require SR-IOV to be disabled when the PF is unbound,
and for those cases, this warning is important for operators to notice
and take action. A warning level helps ensure it doesn't get lost in
normal system logs.

Please let me know how you'd like to proceed.

Thanks,
Qinyun

On 1/27/26 4:48 PM, Christoph Hellwig wrote:
> On Tue, Jan 27, 2026 at 03:33:44PM +0800, Qinyun Tan wrote:
>> The NVMe PCI driver exports the sriov_configure callback via
>> pci_sriov_configure_simple(), which allows userspace to enable SR-IOV
>> VFs through sysfs. However, when the PF driver is unbound, the driver
>> does not disable SR-IOV, leaving VFs orphaned in the system.
> 
> That sounds dangerous.
> 
>> According to Documentation/PCI/pci-iov-howto.rst, PCI drivers that
>> support SR-IOV should call pci_disable_sriov() in their remove callback
>> to properly clean up VFs before the driver is unloaded.
> 
> Bjorn and other PCI folks: is there any reason to not do this in
> the PCI code and leave a landmine for the drivers?
> 
>> Fix this by disabling SR-IOV in nvme_remove(). If VFs are not assigned
>> to a guest, disable SR-IOV. If VFs are still assigned, emit a warning
>> since forcibly disabling would disrupt the guest.
> 
> Well, I think we have to distrupt it, at least for hot unplug.  This
> sounds like we need some better handling in the core code as well.
> 
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 58f3097888a7..4f2dc13de48b 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -3666,6 +3666,15 @@ static void nvme_remove(struct pci_dev *pdev)
>>   	nvme_stop_ctrl(&dev->ctrl);
>>   	nvme_remove_namespaces(&dev->ctrl);
>>   	nvme_dev_disable(dev, true);
>> +
>> +	if (pci_num_vf(pdev)) {
>> +		if (pci_vfs_assigned(pdev))
>> +			dev_warn(&pdev->dev,
>> +				 "WARNING: Removing PF while VFs are assigned - VFs will not be deallocated!\n");
>> +		else
>> +			pci_disable_sriov(pdev);
>> +	}
>> +
>>   	nvme_free_host_mem(dev);
>>   	nvme_dev_remove_admin(dev);
>>   	nvme_dbbuf_dma_free(dev);
>> -- 
>> 2.43.5
> ---end quoted text---




More information about the Linux-nvme mailing list