[PATCH v7 03/10] iommu: introduce a reserved iova cookie

Eric Auger eric.auger at linaro.org
Fri Apr 22 06:02:01 PDT 2016


Hi Robin,
On 04/22/2016 02:36 PM, Robin Murphy wrote:
> On 20/04/16 17:14, Eric Auger wrote:
>> Hi Robin,
>> On 04/20/2016 02:55 PM, Robin Murphy wrote:
>>> On 19/04/16 17:56, Eric Auger wrote:
>>>> This patch introduces some new fields in the iommu_domain struct,
>>>> dedicated to reserved iova management.
>>>>
>>>> In a similar way as DMA mapping IOVA window, we need to store
>>>> information related to a reserved IOVA window.
>>>>
>>>> The reserved_iova_cookie will store the reserved iova_domain
>>>> handle. An RB tree indexed by physical address is introduced to
>>>> store the host physical addresses bound to reserved IOVAs.
>>>>
>>>> Those physical addresses will correspond to MSI frame base
>>>> addresses, also referred to as doorbells. Their number should be
>>>> quite limited per domain.
>>>>
>>>> Also a spin_lock is introduced to protect accesses to the iova_domain
>>>> and RB tree. The choice of a spin_lock is driven by the fact the RB
>>>> tree will need to be accessed in MSI controller code not allowed to
>>>> sleep.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger at linaro.org>
>>>>
>>>> ---
>>>> v5 -> v6:
>>>> - initialize reserved_binding_list
>>>> - use a spinlock instead of a mutex
>>>> ---
>>>>    drivers/iommu/iommu.c | 2 ++
>>>>    include/linux/iommu.h | 6 ++++++
>>>>    2 files changed, 8 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>> index b9df141..f70ef3b 100644
>>>> --- a/drivers/iommu/iommu.c
>>>> +++ b/drivers/iommu/iommu.c
>>>> @@ -1073,6 +1073,8 @@ static struct iommu_domain
>>>> *__iommu_domain_alloc(struct bus_type *bus,
>>>>
>>>>        domain->ops  = bus->iommu_ops;
>>>>        domain->type = type;
>>>> +    spin_lock_init(&domain->reserved_lock);
>>>> +    domain->reserved_binding_list = RB_ROOT;
>>>>
>>>>        return domain;
>>>>    }
>>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>>> index b3e8c5b..60999db 100644
>>>> --- a/include/linux/iommu.h
>>>> +++ b/include/linux/iommu.h
>>>> @@ -24,6 +24,7 @@
>>>>    #include <linux/of.h>
>>>>    #include <linux/types.h>
>>>>    #include <linux/scatterlist.h>
>>>> +#include <linux/spinlock.h>
>>>>    #include <trace/events/iommu.h>
>>>>
>>>>    #define IOMMU_READ    (1 << 0)
>>>> @@ -83,6 +84,11 @@ struct iommu_domain {
>>>>        void *handler_token;
>>>>        struct iommu_domain_geometry geometry;
>>>>        void *iova_cookie;
>>>> +    void *reserved_iova_cookie;
>>>
>>> Why exactly do we need this? From your description, it's for the user of
>>> the domain to keep track of IOVA allocations in, but then that's
>>> precisely what the iova_cookie exists for.
>>
>> I was not sure whether both APIs could not be used concurrently, hence a
>> separate cookie. If we only consider MSI mapping use case I guess we are
>> either with a DMA domain or with a domain for VFIO and I would agree
>> with you, ie. we can reuse the same cookie.
> 
> Unless somebody cooks up some paravirtualised monstrosity where the
> guest driver somehow uses the host kernel's DMA mapping ops (thankfully,
> I'm not sure how that would even be possible), then they should always
> be mutually exclusive.

OK thanks
> 
> (That said, I should probably add a sanity check to
> iommu_dma_put_cookie() to ensure it only touches the cookies of
> IOMMU_DOMAIN_DMA domains...)
> 
>>>> +    /* rb tree indexed by PA, for reserved bindings only */
>>>> +    struct rb_root reserved_binding_list;
>>>
>>> Nit: that's more puzzling than helpful - "reserved binding" is
>>> particularly vague and nondescript, and makes me think of anything but
>>> MSI descriptors.
>> my heart is torn between advised genericity and MSI use case. My natural
>> short-sighted inclination would head me for an MSI mapping dedicated API
>> but I am following advices. As discussed with Alex there are
>> implementation details pretty related to MSI problematics I think (the
>> fact we store the "bindings" in an rb-tree/list, locking)
>>
>> If Marc & Alex I can retarget this API to be less generic.
>>
>>   Plus it's called a list but isn't a list (that said,
>>> given that we'd typically only expect a handful of entries, and lookups
>>> are hardly going to be a performance-critical bottleneck, would a simple
>>> list not suffice?)
>> I fully agree on that point. An rb-tree is overkill today for MSI use
>> case. Again if we were to use this API for anything else, this may
>> change the decision. But sure we can refactor afterwards upon needs. TBH
>> the rb-tree is inherited from vfio_iommu_type1 dma tree where that code
>> was originally located.
> 
> Thinking some more, how feasible would it be to handle the IOVA
> management aspect within the existing tree, i.e. extend struct vfio_dma
> so an entry can represent different types of thing - DMA pages, MSI
> pages, arbitrary reservations - and link to more implementation-specific
> data (e.g. a refcounted MSI descriptor stored elsewhere in the domain)
> as necessary?
it is feasible and  was approximately done there at the early ages of
the series:
https://lkml.org/lkml/2016/1/28/803 &
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016607.html

Then with the intent of doing something reusable the trend was to put it
in the iommu instead of vfio_iommu_typeX.

I am currently trying to make the "msi-iommu api" implemented upon the
dma-iommu api based on the simplifications that we discussed:
- reuse iova_cookie for iova domain
- add an opaque msi_cookie that hides the msi doorbell list + its spinlock
- simplify locking by making sure the msi domain cannot disappear before
the iommu domain destruction

If you agree I would suggest to wait and see the outcome of this new
design and make a shared decision based on that code? Should be
available next week.

Best Regards

Eric

> 
> Robin.
> 
>>>
>>>> +    /* protects reserved cookie and rbtree manipulation */
>>>> +    spinlock_t reserved_lock;
>>>
>>> A cookie is an opaque structure, so any locking it needs would normally
>>> be hidden within. If on the other hand it's not meant to be opaque at
>>> this level, then it should probably be something more specific than a
>>> void * (if at all, as above).
>> agreed
>>
>> Thanks
>>
>> Eric
>>>
>>> Robin.
>>>
>>>>    };
>>>>
>>>>    enum iommu_cap {
>>>>
>>>
>>
> 




More information about the linux-arm-kernel mailing list