[PATCH V1] nvme-pci: disable SR-IOV VFs on driver unbind
Leon Romanovsky
leon at kernel.org
Tue Jan 27 06:31:43 PST 2026
On Tue, Jan 27, 2026 at 09:48:07AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 27, 2026 at 03:33:44PM +0800, Qinyun Tan wrote:
> > The NVMe PCI driver exports the sriov_configure callback via
> > pci_sriov_configure_simple(), which allows userspace to enable SR-IOV
> > VFs through sysfs. However, when the PF driver is unbound, the driver
> > does not disable SR-IOV, leaving VFs orphaned in the system.
>
> That sounds dangerous.
It is not. In a real SR-IOV device, VFs are created by the hardware and
are independent of their PF. There are several use cases where an
operator unbinds the PF and reuses it to improve overall device
utilization.
We have already discussed this in the context of Rust.
https://lore.kernel.org/all/20251122185701.GZ18335@unreal/
>
> > According to Documentation/PCI/pci-iov-howto.rst, PCI drivers that
> > support SR-IOV should call pci_disable_sriov() in their remove callback
> > to properly clean up VFs before the driver is unloaded.
I could not find that claim in Documentation/PCI/pci-iov-howto.rst.
Can you point to the specific sentence that supports it?
>
> Bjorn and other PCI folks: is there any reason to not do this in
> the PCI code and leave a landmine for the drivers?
It will break a lot of real users.
>
> > Fix this by disabling SR-IOV in nvme_remove(). If VFs are not assigned
> > to a guest, disable SR-IOV. If VFs are still assigned, emit a warning
> > since forcibly disabling would disrupt the guest.
>
> Well, I think we have to distrupt it, at least for hot unplug. This
> sounds like we need some better handling in the core code as well.
As mentioned earlier, there are valid users of this functionality relying on
legitimate devices that operate correctly regardless of whether the PF is
bound.
>
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 58f3097888a7..4f2dc13de48b 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -3666,6 +3666,15 @@ static void nvme_remove(struct pci_dev *pdev)
> > nvme_stop_ctrl(&dev->ctrl);
> > nvme_remove_namespaces(&dev->ctrl);
> > nvme_dev_disable(dev, true);
> > +
> > + if (pci_num_vf(pdev)) {
> > + if (pci_vfs_assigned(pdev))
> > + dev_warn(&pdev->dev,
> > + "WARNING: Removing PF while VFs are assigned - VFs will not be deallocated!\n");
> > + else
> > + pci_disable_sriov(pdev);
> > + }
> > +
> > nvme_free_host_mem(dev);
> > nvme_dev_remove_admin(dev);
> > nvme_dbbuf_dma_free(dev);
> > --
> > 2.43.5
> ---end quoted text---
>
More information about the Linux-nvme
mailing list