[PATCH V1] nvme-pci: disable SR-IOV VFs on driver unbind
qinyuntan
qinyuntan at linux.alibaba.com
Thu Jan 29 20:53:25 PST 2026
Hi All,
Thank you all for the insightful discussion!
I agree with Leon's point that not all devices are created equal when it
comes to SR-IOV handling during driver unbind.
Looking at existing driver implementations, I found two different
approaches:
1) mlx5 - unconditionally disables SR-IOV in remove:
drivers/net/ethernet/mellanox/mlx5/core/main.c:
static void remove_one(struct pci_dev *pdev)
{
...
mlx5_sriov_disable(pdev, false);
...
}
drivers/net/ethernet/mellanox/mlx5/core/sriov.c:
void mlx5_sriov_disable(struct pci_dev *pdev, bool num_vf_change)
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
struct devlink *devlink = priv_to_devlink(dev);
int num_vfs = pci_num_vf(dev->pdev);
pci_disable_sriov(pdev); /* Always disable, no
pci_vfs_assigned() check */
devl_lock(devlink);
mlx5_device_disable_sriov(dev, num_vfs, true, num_vf_change);
devl_unlock(devlink);
}
2) ixgbe - checks pci_vfs_assigned() and skips disable if VFs are in use:
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:
static void ixgbe_remove(struct pci_dev *pdev)
{
...
#ifdef CONFIG_PCI_IOV
ixgbe_disable_sriov(adapter);
#endif
...
}
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c:
#ifdef CONFIG_PCI_IOV
if (pci_vfs_assigned(adapter->pdev)) {
e_dev_warn("Unloading driver while VFs are assigned - VFs
will not be deallocated\n");
return -EPERM;
}
pci_disable_sriov(adapter->pdev);
#endif
Regarding the warning level discussion: I would prefer keeping it as
dev_warn() rather than downgrading to dev_info(). As Leon mentioned,
some devices do require SR-IOV to be disabled when the PF is unbound,
and for those cases, this warning is important for operators to notice
and take action. A warning level helps ensure it doesn't get lost in
normal system logs.
Please let me know how you'd like to proceed.
Thanks,
Qinyun
On 1/27/26 4:48 PM, Christoph Hellwig wrote:
> On Tue, Jan 27, 2026 at 03:33:44PM +0800, Qinyun Tan wrote:
>> The NVMe PCI driver exports the sriov_configure callback via
>> pci_sriov_configure_simple(), which allows userspace to enable SR-IOV
>> VFs through sysfs. However, when the PF driver is unbound, the driver
>> does not disable SR-IOV, leaving VFs orphaned in the system.
>
> That sounds dangerous.
>
>> According to Documentation/PCI/pci-iov-howto.rst, PCI drivers that
>> support SR-IOV should call pci_disable_sriov() in their remove callback
>> to properly clean up VFs before the driver is unloaded.
>
> Bjorn and other PCI folks: is there any reason to not do this in
> the PCI code and leave a landmine for the drivers?
>
>> Fix this by disabling SR-IOV in nvme_remove(). If VFs are not assigned
>> to a guest, disable SR-IOV. If VFs are still assigned, emit a warning
>> since forcibly disabling would disrupt the guest.
>
> Well, I think we have to distrupt it, at least for hot unplug. This
> sounds like we need some better handling in the core code as well.
>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 58f3097888a7..4f2dc13de48b 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -3666,6 +3666,15 @@ static void nvme_remove(struct pci_dev *pdev)
>> nvme_stop_ctrl(&dev->ctrl);
>> nvme_remove_namespaces(&dev->ctrl);
>> nvme_dev_disable(dev, true);
>> +
>> + if (pci_num_vf(pdev)) {
>> + if (pci_vfs_assigned(pdev))
>> + dev_warn(&pdev->dev,
>> + "WARNING: Removing PF while VFs are assigned - VFs will not be deallocated!\n");
>> + else
>> + pci_disable_sriov(pdev);
>> + }
>> +
>> nvme_free_host_mem(dev);
>> nvme_dev_remove_admin(dev);
>> nvme_dbbuf_dma_free(dev);
>> --
>> 2.43.5
> ---end quoted text---
More information about the Linux-nvme
mailing list