[PATCH 3/3] um: Add VFIO-based virtual PCI driver
Tiwei Bie
tiwei.btw at antgroup.com
Thu Mar 20 08:16:27 PDT 2025
On 2025/3/20 16:50, Johannes Berg wrote:
> On Sun, 2025-03-16 at 00:19 +0800, Tiwei Bie wrote:
>> Implement a new virtual PCI driver based on the VFIO framework.
>> This driver allows users to pass through PCI devices to UML via
>> VFIO. Currently, only MSI-X capable devices are supported, and
>> it is assumed that drivers will use MSI-X.
>
> Seems nice, but I haven't tested it now :)
Thanks! :)
>
>> +static irqreturn_t uml_vfio_interrupt(int unused, void *opaque)
>> +{
>> + struct uml_vfio_intr_ctx *ctx = opaque;
>> + struct uml_vfio_device *dev = ctx->dev;
>> + int index = ctx - dev->intr_ctx;
>> + int irqfd = dev->udev.irqfd[index];
>> + int irq = dev->msix_data[index];
>> + uint64_t v;
>> + int r;
>> +
>> + do {
>> + r = os_read_file(irqfd, &v, sizeof(v));
>> + if (r == sizeof(v))
>> + generic_handle_irq(irq);
>> + } while (r == sizeof(v) || r == -EINTR);
>> + WARN(r != -EAGAIN, "read returned %d\n", r);
>> +
>> + return IRQ_HANDLED;
>> +}
>
> This seems mostly right - looks like it's an eventfd.
Yeah, it's an eventfd.
>
>> + irqfd = uml_vfio_user_activate_irq(&dev->udev, index);
>> + if (irqfd < 0)
>> + return irqfd;
>> +
>> + ctx->irq = um_request_irq(UM_IRQ_ALLOC, irqfd, IRQ_READ,
>> + uml_vfio_interrupt, IRQF_SHARED,
>> + "vfio-uml", ctx);
>
> However I'm not sure it's right with IRQF_SHARED since you don't detect
> if it was actually not your interrupt that fired? We don't really have
> any good reason to have shared interrupts anyway though, and I'm not
> sure we even *can* share interrupts in the lower layers in ARCH=um, so
> I'm not sure it matters?
Makes sense. Will fix it. Thanks!
>
>> + err = add_sigio_fd(irqfd);
>> + if (err)
>> + goto free_irq;
>
> doesn't the irq framework do that automatically?
Currently it's not supported. Users need to add it manually, e.g.:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/um/drivers/rtc_user.c?h=v6.14-rc7#n43
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/um/drivers/rtc_kern.c?h=v6.14-rc7#n127
>
>> +static int uml_vfio_deactivate_irq(struct uml_vfio_device *dev, int index)
>> +{
>> + struct uml_vfio_intr_ctx *ctx = &dev->intr_ctx[index];
>> +
>> + if (ctx->irq >= 0) {
>> + ignore_sigio_fd(dev->udev.irqfd[index]);
>> + um_free_irq(ctx->irq, ctx);
>
> and here too?
>
>> +static int uml_vfio_update_msix_cap(struct uml_vfio_device *dev,
>> + unsigned int offset, int size,
>> + unsigned long val)
>> +{
>> + int err = 0;
>> +
>> + if (size == 2 && offset == dev->msix_cap + PCI_MSIX_FLAGS) {
>> + switch (val & ~PCI_MSIX_FLAGS_QSIZE) {
>> + case PCI_MSIX_FLAGS_ENABLE:
>> + case 0:
>> + err = uml_vfio_user_update_irqs(&dev->udev);
>> + break;
>> + }
>> + }
>> +
>> + return err;
>
> do you really want to ignore and return 0 if it wasn't supported?
I just realized that this part could be confusing. It handles only some
essential cmds and ignores others. I'll add some comments to clarify.
>
>> +static int uml_vfio_update_msix_table(struct uml_vfio_device *dev,
>> + unsigned int offset, int size,
>> + unsigned long val)
>> +{
>> + int index;
>> +
>> + offset -= dev->msix_offset + PCI_MSIX_ENTRY_DATA;
>> +
>> + if (size != 4 || offset % PCI_MSIX_ENTRY_SIZE != 0)
>> + return 0;
>
> here too?
>
>> +static void uml_vfio_cfgspace_write(struct um_pci_device *pdev,
>> + unsigned int offset, int size,
>> + unsigned long val)
>> +{
>> + struct uml_vfio_device *dev = to_vdev(pdev);
>> +
>> + if (offset < dev->msix_cap + PCI_CAP_MSIX_SIZEOF &&
>> + offset + size > dev->msix_cap)
>
> I'd probably indent the second line just to after "if ("?
Sure. Will do.
>
>> + if (bar == dev->msix_bar && offset + size > dev->msix_offset &&
>> + offset < dev->msix_offset + dev->msix_size)
>> + WARN_ON(uml_vfio_update_msix_table(dev, offset, size, val));
>
> likewise
>
>> +static u8 uml_vfio_find_capability(struct uml_vfio_device *dev, u8 cap)
>> +{
>> + u8 id, pos;
>> + u16 ent;
>> + int ttl = 48;
>
> any particular significance in that value?
I borrowed the value of PCI_FIND_CAP_TTL from drivers/pci/pci.h,
which is an internal header and can't be included from here. I'll
add a comment to it.
>
>> +int uml_vfio_user_setup_iommu(int container)
>> +{
>> + unsigned long reserved = uml_reserved - uml_physmem;
>> + struct vfio_iommu_type1_dma_map dma_map = {
>> + .argsz = sizeof(dma_map),
>> + .flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE,
>> + .vaddr = uml_reserved,
>> + .iova = reserved,
>> + .size = physmem_size - reserved,
>> + };
>
> maybe point over to the big comment in vhost_user_set_mem_table() from
> virtio_uml.c? Same things apply here I guess.
Good idea. Will do.
Thanks again for the review! :)
Regards,
Tiwei
More information about the linux-um
mailing list