[PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
Jon Hunter
jonathanh at nvidia.com
Tue Jun 30 11:17:44 EDT 2020
On 30/06/2020 15:53, Robin Murphy wrote:
> On 2020-06-30 09:19, Jon Hunter wrote:
>>
>> On 30/06/2020 01:10, Krishna Reddy wrote:
>>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
>>> IOVA accesses across them.
>>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
>>> string for Tegra194 SoC SMMU topology.
>>
>> There is no description here of the 3rd SMMU that you mention below.
>> I think that we should describe the full picture here.
>>
>>> Signed-off-by: Krishna Reddy <vdumpa at nvidia.com>
...
>>> +static void nvidia_smmu_tlb_sync(struct arm_smmu_device *smmu, int
>>> page,
>>> + int sync, int status)
>>> +{
>>> + unsigned int delay;
>>> +
>>> + arm_smmu_writel(smmu, page, sync, 0);
>>> +
>>> + for (delay = 1; delay < TLB_LOOP_TIMEOUT_IN_US; delay *= 2) {
>>
>> So we are doubling the delay every time? Is this better than just using
>> the same on each loop?
>
> This is the same logic as the main driver (see 8513c8930069) - the sync
> is expected to complete relatively quickly, hence why we have the inner
> spin loop to avoid the delay entirely in the typical case, and the
> longer it's taking, the more likely it is that something's wrong and it
> will never complete anyway. Realistically, a heavily loaded SMMU at a
> modest clock rate might take us through a couple of iterations of the
> outer loop, but beyond that we're pretty much just killing time until we
> declare it wedged and give up, and by then there's not much point in
> burning power frantically hamering on the interconnect.
Ah OK. Then maybe we should move the definitions for TLB_LOOP_TIMEOUT
and TLB_SPIN_COUNT into the arm-smmu.h so that we can use them directly
in this file instead of redefining them. Then it maybe clear that these
are part of the main driver.
>>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device
>>> *smmu)
>>> +{
>>> + unsigned int i;
>>> + struct nvidia_smmu *nvidia_smmu;
>>> + struct platform_device *pdev = to_platform_device(smmu->dev);
>>> +
>>> + nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu),
>>> GFP_KERNEL);
>>> + if (!nvidia_smmu)
>>> + return ERR_PTR(-ENOMEM);
>>> +
>>> + nvidia_smmu->smmu = *smmu;
>>> + /* Instance 0 is ioremapped by arm-smmu.c after this function
>>> returns */
>>> + nvidia_smmu->num_inst = 1;
>>> +
>>> + for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>>> + struct resource *res;
>>> +
>>> + res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>>> + if (!res)
>>> + break;
>>> +
>>> + nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>>> + if (IS_ERR(nvidia_smmu->bases[i]))
>>> + return ERR_CAST(nvidia_smmu->bases[i]);
>>> +
>>> + nvidia_smmu->num_inst++;
>>> + }
>>> +
>>> + nvidia_smmu->smmu.impl = &nvidia_smmu_impl;
>>> + /*
>>> + * Free the arm_smmu_device struct allocated in arm-smmu.c.
>>> + * Once this function returns, arm-smmu.c would use arm_smmu_device
>>> + * allocated as part of nvidia_smmu struct.
>>> + */
>>> + devm_kfree(smmu->dev, smmu);
>>
>> Why don't we just store the pointer of the smmu struct passed to this
>> function
>> in the nvidia_smmu struct and then we do not need to free this here.
>> In other
>> words make ...
>>
>> struct nvidia_smmu {
>> struct arm_smmu_device *smmu;
>> unsigned int num_inst;
>> void __iomem *bases[MAX_SMMU_INSTANCES];
>> };
>>
>> This seems more appropriate, than copying the struct and freeing memory
>> allocated else-where.
>
> But then how do you get back to struct nvidia_smmu given just a pointer
> to struct arm_smmu_device?
Ah yes of course that is what I was missing. I wondered what was going
on here. So I think we should add a nice comment in the above function
of why we are copying this and cannot simply store the pointer.
Cheers
Jon
--
nvpublic
More information about the linux-arm-kernel
mailing list