[RFC v3 15/15] irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed

Eric Auger eric.auger at linaro.org
Thu Feb 18 08:58:45 PST 2016


Hi Marc,
On 02/18/2016 04:47 PM, Marc Zyngier wrote:
> On 18/02/16 15:33, Eric Auger wrote:
>> Hi Marc,
>> On 02/18/2016 12:33 PM, Marc Zyngier wrote:
>>> On Fri, 12 Feb 2016 08:13:17 +0000
>>> Eric Auger <eric.auger at linaro.org> wrote:
>>>
>>>> In case the msi_desc references a device attached to an iommu
>>>> domain, the msi address needs to be mapped in the IOMMU. Else any
>>>> MSI write transaction will cause a fault.
>>>>
>>>> gic_set_msi_addr detects that case and allocates an iova bound
>>>> to the physical address page comprising the MSI frame. This iova
>>>> then is used as the msi_msg address. Unset operation decrements the
>>>> reference on the binding.
>>>>
>>>> The functions are called in the irq_write_msi_msg ops implementation.
>>>> At that time we can recognize whether the msi is setup or teared down
>>>> looking at the msi_msg content. Indeed msi_domain_deactivate zeroes all
>>>> the fields.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger at linaro.org>
>>>>
>>>> ---
>>>>
>>>> v2 -> v3:
>>>> - protect iova/addr manipulation with CONFIG_ARCH_DMA_ADDR_T_64BIT and
>>>>   CONFIG_PHYS_ADDR_T_64BIT
>>>> - only expose gic_pci_msi_domain_write_msg in case CONFIG_IOMMU_API &
>>>>   CONFIG_PCI_MSI_IRQ_DOMAIN are set.
>>>> - gic_set/unset_msi_addr duly become static
>>>> ---
>>>>  drivers/irqchip/irq-gic-common.c         | 69 ++++++++++++++++++++++++++++++++
>>>>  drivers/irqchip/irq-gic-common.h         |  5 +++
>>>>  drivers/irqchip/irq-gic-v2m.c            |  7 +++-
>>>>  drivers/irqchip/irq-gic-v3-its-pci-msi.c |  5 +++
>>>>  4 files changed, 85 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
>>>> index f174ce0..46cd06c 100644
>>>> --- a/drivers/irqchip/irq-gic-common.c
>>>> +++ b/drivers/irqchip/irq-gic-common.c
>>>> @@ -18,6 +18,8 @@
>>>>  #include <linux/io.h>
>>>>  #include <linux/irq.h>
>>>>  #include <linux/irqchip/arm-gic.h>
>>>> +#include <linux/iommu.h>
>>>> +#include <linux/msi.h>
>>>>  
>>>>  #include "irq-gic-common.h"
>>>>  
>>>> @@ -121,3 +123,70 @@ void gic_cpu_config(void __iomem *base, void (*sync_access)(void))
>>>>  	if (sync_access)
>>>>  		sync_access();
>>>>  }
>>>> +
>>>> +#if defined(CONFIG_IOMMU_API) && defined(CONFIG_PCI_MSI_IRQ_DOMAIN)
>>>> +static int gic_set_msi_addr(struct irq_data *data, struct msi_msg *msg)
>>>> +{
>>>> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
>>>> +	struct device *dev = msi_desc_to_dev(desc);
>>>> +	struct iommu_domain *d;
>>>> +	phys_addr_t addr;
>>>> +	dma_addr_t iova;
>>>> +	int ret;
>>>> +
>>>> +	d = iommu_get_domain_for_dev(dev);
>>>> +	if (!d)
>>>> +		return 0;
>>>> +#ifdef CONFIG_PHYS_ADDR_T_64BIT
>>>> +	addr = ((phys_addr_t)(msg->address_hi) << 32) | msg->address_lo;
>>>> +#else
>>>> +	addr = msg->address_lo;
>>>> +#endif
>>>> +
>>>> +	ret = iommu_get_single_reserved(d, addr, IOMMU_WRITE, &iova);
>>>> +
>>>> +	if (!ret) {
>>>> +		msg->address_lo = lower_32_bits(iova);
>>>> +		msg->address_hi = upper_32_bits(iova);
>>>> +	}
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +
>>>> +static void gic_unset_msi_addr(struct irq_data *data)
>>>> +{
>>>> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
>>>> +	struct device *dev;
>>>> +	struct iommu_domain *d;
>>>> +	dma_addr_t iova;
>>>> +
>>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>>>> +	iova = ((dma_addr_t)(desc->msg.address_hi) << 32) |
>>>> +		desc->msg.address_lo;
>>>> +#else
>>>> +	iova = desc->msg.address_lo;
>>>> +#endif
>>>> +
>>>> +	dev = msi_desc_to_dev(desc);
>>>> +	if (!dev)
>>>> +		return;
>>>> +
>>>> +	d = iommu_get_domain_for_dev(dev);
>>>> +	if (!d)
>>>> +		return;
>>>> +
>>>> +	iommu_put_single_reserved(d, iova);
>>>> +}
>>>> +
>>>> +void gic_pci_msi_domain_write_msg(struct irq_data *irq_data,
>>>> +				  struct msi_msg *msg)
>>>> +{
>>>> +	if (!msg->address_hi && !msg->address_lo && !msg->data)
>>>> +		gic_unset_msi_addr(irq_data); /* deactivate */
>>>> +	else
>>>> +		gic_set_msi_addr(irq_data, msg); /* activate, set_affinity */
>>>> +
>>>> +	pci_msi_domain_write_msg(irq_data, msg);
>>>> +}
>>>
>>> So by doing that, you are specializing this infrastructure to PCI.
>>> If you hijacked irq_compose_msi_msg() instead, you'd have both
>>> platform and PCI MSI for the same price.
>>>
>>> I can see a potential problem with the teardown of an MSI (I don't
>>> think the compose method is called on teardown), but I think this could
>>> be easily addressed.
>> Yes effectively this is the reason why I moved it from
>> irq_compose_msi_msg (my original implementation) to irq_write_msi_msg. I
>> noticed I had no way to detect the teardown whereas the
>> msi_domain_deactivate also calls irq_write_msi_msg which is quite
>> practical ;-) To be honest I need to further look at the non PCI
>> implementation.
> 
> Another thing to consider is that MSI controllers may use different
> doorbells for different CPU affinities.

OK thanks for pointing this out.

This is also a good confirmation that a single IOVA address is not
always sufficient (at some point we wondered if we could directly use
the MSI controller guest PA instead of having the user-space providing
an IOVA pool)

 With your implementation,
> repeatedly changing the affinity from one CPU to another would increase
> the refcounting, and never actually lower it (you don't necessarily go
> via an "unmap").

 Of course, none of that applies to GICv2m/GICv3-ITS,
> but that's worth considering.
> 
> So I think we may need some better tracking of the IOVA we program in
> the device, and offer a generic infrastructure for this instead of
> hiding it in the MSI controller drivers.
OK I will study that.

Thanks for your time!

Eric
> 
> Thanks,
> 
> 	M.
> 




More information about the linux-arm-kernel mailing list