[PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

Tue Jun 30 11:17:44 EDT 2020

On 30/06/2020 15:53, Robin Murphy wrote:
> On 2020-06-30 09:19, Jon Hunter wrote:
>>
>> On 30/06/2020 01:10, Krishna Reddy wrote:
>>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
>>> IOVA accesses across them.
>>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
>>> string for Tegra194 SoC SMMU topology.
>>
>> There is no description here of the 3rd SMMU that you mention below.
>> I think that we should describe the full picture here.
>>  
>>> Signed-off-by: Krishna Reddy <vdumpa at nvidia.com>

...

>>> +static void nvidia_smmu_tlb_sync(struct arm_smmu_device *smmu, int
>>> page,
>>> +               int sync, int status)
>>> +{
>>> +    unsigned int delay;
>>> +
>>> +    arm_smmu_writel(smmu, page, sync, 0);
>>> +
>>> +    for (delay = 1; delay < TLB_LOOP_TIMEOUT_IN_US; delay *= 2) {
>>
>> So we are doubling the delay every time? Is this better than just using
>> the same on each loop?
> 
> This is the same logic as the main driver (see 8513c8930069) - the sync
> is expected to complete relatively quickly, hence why we have the inner
> spin loop to avoid the delay entirely in the typical case, and the
> longer it's taking, the more likely it is that something's wrong and it
> will never complete anyway. Realistically, a heavily loaded SMMU at a
> modest clock rate might take us through a couple of iterations of the
> outer loop, but beyond that we're pretty much just killing time until we
> declare it wedged and give up, and by then there's not much point in
> burning power frantically hamering on the interconnect.

Ah OK. Then maybe we should move the definitions for TLB_LOOP_TIMEOUT
and TLB_SPIN_COUNT into the arm-smmu.h so that we can use them directly
in this file instead of redefining them. Then it maybe clear that these
are part of the main driver.

 >>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device
>>> *smmu)
>>> +{
>>> +    unsigned int i;
>>> +    struct nvidia_smmu *nvidia_smmu;
>>> +    struct platform_device *pdev = to_platform_device(smmu->dev);
>>> +
>>> +    nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu),
>>> GFP_KERNEL);
>>> +    if (!nvidia_smmu)
>>> +        return ERR_PTR(-ENOMEM);
>>> +
>>> +    nvidia_smmu->smmu = *smmu;
>>> +    /* Instance 0 is ioremapped by arm-smmu.c after this function
>>> returns */
>>> +    nvidia_smmu->num_inst = 1;
>>> +
>>> +    for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>>> +        struct resource *res;
>>> +
>>> +        res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>>> +        if (!res)
>>> +            break;
>>> +
>>> +        nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>>> +        if (IS_ERR(nvidia_smmu->bases[i]))
>>> +            return ERR_CAST(nvidia_smmu->bases[i]);
>>> +
>>> +        nvidia_smmu->num_inst++;
>>> +    }
>>> +
>>> +    nvidia_smmu->smmu.impl = &nvidia_smmu_impl;
>>> +    /*
>>> +     * Free the arm_smmu_device struct allocated in arm-smmu.c.
>>> +     * Once this function returns, arm-smmu.c would use arm_smmu_device
>>> +     * allocated as part of nvidia_smmu struct.
>>> +     */
>>> +    devm_kfree(smmu->dev, smmu);
>>
>> Why don't we just store the pointer of the smmu struct passed to this
>> function
>> in the nvidia_smmu struct and then we do not need to free this here.
>> In other
>> words make ...
>>
>>   struct nvidia_smmu {
>>     struct arm_smmu_device    *smmu;
>>     unsigned int        num_inst;
>>     void __iomem        *bases[MAX_SMMU_INSTANCES];
>>   };
>>
>> This seems more appropriate, than copying the struct and freeing memory
>> allocated else-where.
> 
> But then how do you get back to struct nvidia_smmu given just a pointer
> to struct arm_smmu_device?

Ah yes of course that is what I was missing. I wondered what was going
on here. So I think we should add a nice comment in the above function
of why we are copying this and cannot simply store the pointer.

Cheers
Jon

-- 
nvpublic