ath11k and vfio-pci support

Tue Jan 16 21:47:28 PST 2024


On 1/16/2024 6:41 PM, David Woodhouse wrote:
> On Tue, 2024-01-16 at 18:08 +0800, Baochen Qiang wrote:
>>
>>
>> On 1/16/2024 1:46 AM, Alex Williamson wrote:
>>> On Sun, 14 Jan 2024 16:36:02 +0200
>>> Kalle Valo <kvalo at kernel.org> wrote:
>>>
>>>> Baochen Qiang <quic_bqiang at quicinc.com> writes:
>>>>
>>>>>>> Strange that still fails. Are you now seeing this error in your
>>>>>>> host or your Qemu? or both?
>>>>>>> Could you share your test steps? And if you can share please be as
>>>>>>> detailed as possible since I'm not familiar with passing WLAN
>>>>>>> hardware to a VM using vfio-pci.
>>>>>>
>>>>>> Just in Qemu, the hardware works fine on my host machine.
>>>>>> I basically follow this guide to set it up, its written in the
>>>>>> context of GPUs/libvirt but the host setup is exactly the same. By
>>>>>> no means do you need to read it all, once you set the vfio-pci.ids
>>>>>> and see your unclaimed adapter you can stop:
>>>>>> https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF
>>>>>> In short you should be able to set the following host kernel options
>>>>>> and reboot (assuming your motherboard/hardware is compatible):
>>>>>> intel_iommu=on iommu=pt vfio-pci.ids=17cb:1103
>>>>>> Obviously change the device/vendor IDs to whatever ath11k hw you
>>>>>> have. Once the host is rebooted you should see your wlan adapter as
>>>>>> UNCLAIMED, showing the driver in use as vfio-pci. If not, its likely
>>>>>> your motherboard just isn't compatible, the device has to be in its
>>>>>> own IOMMU group (you could try switching PCI ports if this is the
>>>>>> case).
>>>>>> I then build a "kvm_guest.config" kernel with the driver/firmware
>>>>>> for ath11k and boot into that with the following Qemu options:
>>>>>> -enable-kvm -device -vfio-pci,host=<PCI address>
>>>>>> If it seems easier you could also utilize IWD's test-runner which
>>>>>> handles launching the Qemu kernel automatically, detecting any
>>>>>> vfio-devices and passes them through and mounts some useful host
>>>>>> folders into the VM. Its actually a very good general purpose tool
>>>>>> for kernel testing, not just for IWD:
>>>>>> https://git.kernel.org/pub/scm/network/wireless/iwd.git/tree/doc/test-runner.txt
>>>>>> Once set up you can just run test-runner with a few flags and you'll
>>>>>> boot into a shell:
>>>>>> ./tools/test-runner -k <kernel-image> --hw --start /bin/bash
>>>>>> Please reach out if you have questions, thanks for looking into
>>>>>> this.
>>>>>
>>>>> Thanks for these details. I reproduced this issue by following your guide.
>>>>>
>>>>> Seems the root cause is that the MSI vector assigned to WCN6855 in
>>>>> qemu is different with that in host. In my case the MSI vector in qemu
>>>>> is [Address: fee00000  Data: 0020] while in host it is [Address:
>>>>> fee00578 Data: 0000]. So in qemu ath11k configures MSI vector
>>>>> [Address: fee00000 Data: 0020] to WCN6855 hardware/firmware, and
>>>>> firmware uses that vector to fire interrupts to host/qemu. However
>>>>> host IOMMU doesn't know that vector because the real vector is
>>>>> [Address: fee00578  Data: 0000], as a result host blocks that
>>>>> interrupt and reports an error, see below log:
>>>>>
>>>>> [ 1414.206069] DMAR: DRHD: handling fault status reg 2
>>>>> [ 1414.206081] DMAR: [INTR-REMAP] Request device [02:00.0] fault index
>>>>> 0x0 [fault reason 0x25] Blocked a compatibility format interrupt
>>>>> request
>>>>> [ 1414.210334] DMAR: DRHD: handling fault status reg 2
>>>>> [ 1414.210342] DMAR: [INTR-REMAP] Request device [02:00.0] fault index
>>>>> 0x0 [fault reason 0x25] Blocked a compatibility format interrupt
>>>>> request
>>>>> [ 1414.212496] DMAR: DRHD: handling fault status reg 2
>>>>> [ 1414.212503] DMAR: [INTR-REMAP] Request device [02:00.0] fault index
>>>>> 0x0 [fault reason 0x25] Blocked a compatibility format interrupt
>>>>> request
>>>>> [ 1414.214600] DMAR: DRHD: handling fault status reg 2
>>>>>
>>>>> While I don't think there is a way for qemu/ath11k to get the real MSI
>>>>> vector from host, I will try to read the vfio code to check further.
>>>>> Before that, to unblock you, a possible hack is to hard code the MSI
>>>>> vector in qemu to the same as in host, on condition that the MSI
>>>>> vector doesn't change.
>>>>
>>>> Baochen, awesome that you were able to debug this further. Now we at
>>>> least know what's the problem.
>>>
>>> It's an interesting problem, I don't think we've seen another device
>>> where the driver reads the MSI register in order to program another
>>> hardware entity to match the MSI address and data configuration.
>>>
>>> When assigning a device, the host and guest use entirely separate
>>> address spaces for MSI interrupts.  When the guest enables MSI, the
>>> operation is trapped by the VMM and triggers an ioctl to the host to
>>> perform an equivalent configuration.  Generally the physical device
>>> will interrupt within the host where it may be directly attached to KVM
>>> to signal the interrupt, trigger through the VMM, or where
>>> virtualization hardware supports it, the interrupt can directly trigger
>>> the vCPU.   From the VM perspective, the guest address/data pair is used
>>> to signal the interrupt, which is why it makes sense to virtualize the
>>> MSI registers.
>>
>> Hi Alex, could you help elaborate more? why from the VM perspective MSI
>> virtualization is necessary?
> 
> An MSI is just a write to physical memory space. You can even use it
> like that; configure the device to just write 4 bytes to some address
> in a struct in memory to show that it needs attention, and you then
> poll that memory.
> 
> But mostly we don't (ab)use it like that, of course. We tell the device
> to write to a special range of the physical address space where the
> interrupt controller lives — the range from 0xfee00000 to 0xfeefffff.
> The low 20 bits of the address, and the 32 bits of data written to that
> address, tell the interrupt controller which CPU to interrupt, and
> which vector to raise on the CPU (as well as some other details and
> weird interrupt modes which are theoretically encodable).
> 
> So in your example, the guest writes [Address: fee00000  Data: 0020]
> which means it wants vector 0x20 on CPU#0 (well, the CPU with APICID
> 0). But that's what the *guest* wants. If we just blindly programmed
> that into the hardware, the hardware would deliver vector 0x20 to the
> host's CPU0... which would be very confused by it.
> 
> The host has a driver for that device, probably the VFIO driver. The
> host registers its own interrupt handlers for the real hardware,
> decides which *host* CPU (and vector) should be notified when something
> happens. And when that happens, the VFIO driver will raise an event on
> an eventfd, which will notify QEMU to inject the appropriate interrupt
> into the guest.
> 
> So... when the guest enables the MSI, that's trapped by QEMU which
> remembers which *guest* CPU/vector the interrupt should go to. QEMU
> tells VFIO to enable the corresponding interrupt, and what gets
> programmed into the actual hardware is up to the *host* operating
> system; nothing to do with the guest's information at all.
> 
> Then when the actual hardware raises the interrupt, the VFIO interrupt
> handler runs in the guest, signals an event on the eventfd, and QEMU
> receives that and injects the event into the appropriate guest vCPU.
> 
> (In practice QEMU doesn't do it these days; there's actually a shortcut
> which improves latency by allowing the kernel to deliver the event to
> the guest directly, connecting the eventfd directly to the KVM irq
> routing table.)
> 
> 
> Interrupt remapping is probably not important here, but I'll explain it
> briefly anyway. With interrupt remapping, the IOMMU handles the
> 'memory' write from the device, just as it handles all other memory
> transactions. One of the reasons for interrupt remapping is that the
> original definitions of the bits in the MSI (the low 20 bits of the
> address and the 32 bits of what's written) only had 8 bits for the
> target CPU APICID. And we have bigger systems than that now.
> 
> So by using one of the spare bits in the MSI message, we can indicate
> that this isn't just a directly-encoded cpu/vector in "Compatibility
> Format", but is a "Remappable Format" interrupt. Instead of the
> cpu/vector it just contains an index in to the IOMMU's Interrupt
> Redirection Table. Which *does* have a full 32-bits for the target APIC
> ID. That's why x2apic support (which gives us support for >254 CPUs)
> depends on interrupt remapping.
> 
> The other thing that the IOMMU can do in modern systems is *posted*
> interrupts. Where the entry in the IOMMU's IRT doesn't just specify the
> host's CPU/vector, but actually specifies a *vCPU* to deliver the
> interrupt to.
> 
> All of which is mostly irrelevant as it's just another bypass
> optimisation to improve latency. The key here is that what the guest
> writes to its emulated MSI table and what the host writes to the real
> hardware are not at all related.
> 
Thanks. A really detailed and clear explanation.

> If we had had this posted interrupt support from the beginning, perhaps
> we could have have a much simpler model — we just let the guest write
> its intended (v)CPU#/vector *directly* to the MSI table in the device,
> and let the IOMMU fix it up by having a table pointing to the
> appropriate set of vCPUs. But that isn't how it happened. The model we
> have is that the VMM has to *emulate* the config space and handle the
> interrupts as described above.
> 
> This means that whenever a device has a non-standard way of configuring
> MSIs, the VMM has to understand and intercept that. I believe we've
> even seen some Atheros devices with the MSI target in some weird MMIO
> registers instead of the standard location, so we've had to hack QEMU
> to handle those too?
> 
>> And, maybe a stupid question, is that possible VM/KVM or vfio only
>> virtualize write operation to MSI register but leave read operation
>> un-virtualized? I am asking this because in that way ath11k may get a
>> chance to run in VM after getting the real vector.
> 
> That might confuse a number of operating systems. Especially if they
> mask/unmask by reading the register, flipping the mask bit and writing
> back again.
> 
> How exactly is the content of this register then given back to the
> firmware? Is that communication snoopable by the VMM?
By programming it to a MMIO register. It is a non-standard register and 
also device specific, not sure snoopable or not by the VMM.

> 
> 
>>>
>>> Off hand I don't have a good solution for this, the hardware is
>>> essentially imposing a unique requirement for MSI programming that the
>>> driver needs visibility of the physical MSI address and data.
>>>
> 
> Strictly, the driver doesn't need visibility to the actual values used
> by the hardware. Another way of it looking at it would be to say that
> the driver programs the MSI through this non-standard method, it just
> needs the VMM to trap and handle that, just as the VMM does for the
> standard MSI table.
> 
> Which is what I thought we'd already seen on some Atheros devices.
> 
>>>    It's
>>> conceivable that device specific code could either make the physical
>>> address/data pair visible to the VM or trap the firmware programming to
>>> inject the correct physical values.  Is there somewhere other than the
>>> standard MSI capability in config space that the driver could learn the
>>> physical values, ie. somewhere that isn't virtualized?  Thanks,
>>
>> I don't think we have such capability in configuration space.
> 
> Configuration space is a complete fiction though; it's all emulated. We
> can do anything we like. Or we can have a PV hypercall which will
> report it. I don't know that we'd *want* to, but all things are
> possible.
OK, I get the point now.

>