[PATCH v6 3/5] iommu/arm-smmu: introduction of ACTLR for custom prefetcher settings

Tue Jan 2 22:24:53 PST 2024

On 12/21/2023 4:02 PM, Dmitry Baryshkov wrote:
> On Thu, 21 Dec 2023 at 12:02, Bibek Kumar Patro
> <quic_bibekkum at quicinc.com> wrote:
>>
>>
>>
>> On 12/21/2023 6:06 AM, Dmitry Baryshkov wrote:
>>> On Wed, 20 Dec 2023 at 15:39, Bibek Kumar Patro
>>> <quic_bibekkum at quicinc.com> wrote:
>>>>
>>>> Currently in Qualcomm  SoCs the default prefetch is set to 1 which allows
>>>> the TLB to fetch just the next page table. MMU-500 features ACTLR
>>>> register which is implementation defined and is used for Qualcomm SoCs
>>>> to have a custom prefetch setting enabling TLB to prefetch the next set
>>>> of page tables accordingly allowing for faster translations.
>>>>
>>>> ACTLR value is unique for each SMR (Stream matching register) and stored
>>>> in a pre-populated table. This value is set to the register during
>>>> context bank initialisation.
>>>>
>>>> Signed-off-by: Bibek Kumar Patro <quic_bibekkum at quicinc.com>
>>>> ---
>>>>    drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 45 ++++++++++++++++++++++
>>>>    drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h |  6 ++-
>>>>    drivers/iommu/arm/arm-smmu/arm-smmu.c      |  5 ++-
>>>>    drivers/iommu/arm/arm-smmu/arm-smmu.h      |  5 +++
>>>>    4 files changed, 58 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>> index 20c9836d859b..1cefdd0ca110 100644
>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>> @@ -24,6 +24,12 @@
>>>>    #define CPRE                   (1 << 1)
>>>>    #define CMTLB                  (1 << 0)
>>>>
>>>> +struct actlr_config {
>>>> +       u16 sid;
>>>> +       u16 mask;
>>>> +       u32 actlr;
>>>> +};
>>>> +
>>>>    static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
>>>>    {
>>>>           return container_of(smmu, struct qcom_smmu, smmu);
>>>> @@ -215,9 +221,38 @@ static bool qcom_adreno_can_do_ttbr1(struct arm_smmu_device *smmu)
>>>>           return true;
>>>>    }
>>>>
>>>> +static void qcom_smmu_set_actlr(struct device *dev, struct arm_smmu_device *smmu, int cbndx,
>>>> +               const struct actlr_config *actlrcfg)
>>>> +{
>>>> +       struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
>>>> +       struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>>>> +       struct arm_smmu_smr *smr;
>>>> +       u16 mask;
>>>> +       int idx;
>>>> +       u16 id;
>>>> +       int i;
>>>> +
>>>> +       for (; actlrcfg->sid || actlrcfg->mask || actlrcfg->actlr; actlrcfg++) {
>>>> +               id = actlrcfg->sid;
>>>> +               mask = actlrcfg->mask;
>>>> +
>>>> +               for_each_cfg_sme(cfg, fwspec, i, idx) {
>>>> +                       smr = &smmu->smrs[idx];
>>>> +                       if (smr_is_subset(smr, id, mask)) {
>>>> +                               arm_smmu_cb_write(smmu, cbndx, ARM_SMMU_CB_ACTLR,
>>>> +                                               actlrcfg->actlr);
>>>> +                               break;
>>>> +                       }
>>>> +               }
>>>> +       }
>>>> +}
>>>> +
>>>>    static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>>>>                   struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
>>>>    {
>>>> +       struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>> +       struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
>>>> +       int cbndx = smmu_domain->cfg.cbndx;
>>>>           struct adreno_smmu_priv *priv;
>>>>
>>>>           smmu_domain->cfg.flush_walk_prefer_tlbiasid = true;
>>>> @@ -248,6 +283,9 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>>>>           priv->set_stall = qcom_adreno_smmu_set_stall;
>>>>           priv->resume_translation = qcom_adreno_smmu_resume_translation;
>>>>
>>>> +       if (qsmmu->data->actlrcfg_gfx)
>>>> +               qcom_smmu_set_actlr(dev, smmu, cbndx, qsmmu->data->actlrcfg_gfx);
>>>
>>> There was a feedback point against v4 that there can be more than two
>>> (apps + gpu) SMMU devices. No, we can not use additional compat
>>> strings, the SMMU units are compatible with each other.
>>
>> Just to understand better, did you mean if in the below check
>> [inside arm-smmu-qcom.c file during qcom_smmu_create()], "else" has two
>> things? (Currently adreno_impl for gpu smmu, else for only
>> apps smmu)
> 
> qcom,adreno-smmu is quite unique here, this is the only distinctive
> substring. We do not have such compat strings for any other of SMMU
> nodes.
> 

Apologies for the delayed response, I was on leave and could
not monitor the replies.

for other SMMUs (except qcom-adreno-smmu) we would
need to use the IO address matching.

>>
>>            if (np && of_device_is_compatible(np, "qcom,adreno-smmu"))
>>                    impl = data->adreno_impl;
>>            else
>>                    impl = data->impl;
>>
>>> Please add
>>> matching between the smmu and particular actlr table using the IO
>>> address of the SMMU block.
>>>
>>
>> The ACTLR table for each smmu will have A IO address attached, so based
>> on IO address we can apply ACTLR.
>> Is this your proposal((IMO hardcoding IO in driver won't be viable,
>> isn't it?), or in smmu DT we would need to set the IO?
> 
> Unfortunately, I meant exactly that: hardcoding addresses of the SMMU
> register spaces. see drivers/gpu/drm/msm/dsi_cfg.c
> Then during device probe the driver can match the IO address to the
> list of the per-platform ACTLR tables and select the correct one.
> Then you don't even need a special actlrcfg_gfx. The GFX will fall
> into the main schema.
> 

Thanks for the reference, I will check once and try to evaluate a
similar implementation for ACTLR table as well.

Thanks,
Bibek

>>
>>
>> Thanks & regards,
>> Bibek
>>
>>>> +
>>>>           return 0;
>>>>    }
>>>>
>>>> @@ -274,6 +312,13 @@ static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {
>>>>    static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>>>>                   struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
>>>>    {
>>>> +       struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>> +       struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
>>>> +       int cbndx = smmu_domain->cfg.cbndx;
>>>> +
>>>> +       if (qsmmu->data->actlrcfg)
>>>> +               qcom_smmu_set_actlr(dev, smmu, cbndx, qsmmu->data->actlrcfg);
>>>> +
>>>
>>> One issue occured to me, while I was reviewing the patchset. The ACTLR
>>> settings are related to the whole SMMU setup, but we are applying them
>>> each time there is an SMMU context init (in other words, one per each
>>> domain). Is that correct? Or it's just that there is no better place
>>> for initialising the global register set? Would it be better to
>>> reprogram the ACTLR registers which are related just to this
>>> particular domain?
>>>
>>>>           smmu_domain->cfg.flush_walk_prefer_tlbiasid = true;
>>>>
>>>>           return 0;
>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
>>>> index f3b91963e234..cb4cb402c202 100644
>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
>>>> @@ -1,6 +1,6 @@
>>>>    /* SPDX-License-Identifier: GPL-2.0-only */
>>>>    /*
>>>> - * Copyright (c) 2022, Qualcomm Innovation Center, Inc. All rights reserved.
>>>> + * Copyright (c) 2023, Qualcomm Innovation Center, Inc. All rights reserved.
>>>>     */
>>>>
>>>>    #ifndef _ARM_SMMU_QCOM_H
>>>> @@ -24,7 +24,11 @@ struct qcom_smmu_config {
>>>>           const u32 *reg_offset;
>>>>    };
>>>>
>>>> +struct actlr_config;
>>>> +
>>>>    struct qcom_smmu_match_data {
>>>> +       const struct actlr_config *actlrcfg;
>>>> +       const struct actlr_config *actlrcfg_gfx;
>>>>           const struct qcom_smmu_config *cfg;
>>>>           const struct arm_smmu_impl *impl;
>>>>           const struct arm_smmu_impl *adreno_impl;
>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>> index d6d1a2a55cc0..0c7f700b27dd 100644
>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>> @@ -990,9 +990,10 @@ static int arm_smmu_find_sme(struct arm_smmu_device *smmu, u16 id, u16 mask)
>>>>                    * expect simply identical entries for this case, but there's
>>>>                    * no harm in accommodating the generalisation.
>>>>                    */
>>>> -               if ((mask & smrs[i].mask) == mask &&
>>>> -                   !((id ^ smrs[i].id) & ~smrs[i].mask))
>>>> +
>>>> +               if (smr_is_subset(&smrs[i], id, mask))
>>>>                           return i;
>>>> +
>>>>                   /*
>>>>                    * If the new entry has any other overlap with an existing one,
>>>>                    * though, then there always exists at least one stream ID
>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
>>>> index 703fd5817ec1..2e4f65412c6b 100644
>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
>>>> @@ -501,6 +501,11 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page,
>>>>                   writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);
>>>>    }
>>>>
>>>> +static inline bool smr_is_subset(struct arm_smmu_smr *smrs, u16 id, u16 mask)
>>>> +{
>>>> +       return (mask & smrs->mask) == mask && !((id ^ smrs->id) & ~smrs->mask);
>>>> +}
>>>> +
>>>>    #define ARM_SMMU_GR0           0
>>>>    #define ARM_SMMU_GR1           1
>>>>    #define ARM_SMMU_CB(s, n)      ((s)->numpage + (n))
>>>> --
>>>> 2.17.1
>>>>
>>>
>>>
> 
> 
>