[PATCH v4] iommu/arm-smmu-v3: permit users to disable msi polling

Song Bao Hua (Barry Song) song.bao.hua at hisilicon.com
Tue Aug 18 19:54:26 EDT 2020



> -----Original Message-----
> From: Robin Murphy [mailto:robin.murphy at arm.com]
> Sent: Wednesday, August 19, 2020 2:31 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua at hisilicon.com>; will at kernel.org;
> joro at 8bytes.org
> Cc: Zengtao (B) <prime.zeng at hisilicon.com>;
> iommu at lists.linux-foundation.org; linux-arm-kernel at lists.infradead.org;
> Linuxarm <linuxarm at huawei.com>
> Subject: Re: [PATCH v4] iommu/arm-smmu-v3: permit users to disable msi
> polling
> 
> On 2020-08-18 12:17, Barry Song wrote:
> > Polling by MSI isn't necessarily faster than polling by SEV. Tests on
> > hi1620 show hns3 100G NIC network throughput can improve from 25G to
> > 27G if we disable MSI polling while running 16 netperf threads sending
> > UDP packets in size 32KB. TX throughput can improve from 7G to 7.7G for
> > single thread.
> > The reason for the throughput improvement is that the latency to poll
> > the completion of CMD_SYNC becomes smaller. After sending a CMD_SYNC
> > in an empty cmd queue, typically we need to wait for 280ns using MSI
> > polling. But we only need around 190ns after disabling MSI polling.
> > This patch provides a command line option so that users can decide to
> > use MSI polling or not based on their tests.
> >
> > Signed-off-by: Barry Song <song.bao.hua at hisilicon.com>
> > ---
> >   -v4: rebase on top of 5.9-rc1
> >   refine changelog
> >
> >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18
> ++++++++++++++----
> >   1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 7196207be7ea..89d3cb391fef 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -418,6 +418,11 @@ module_param_named(disable_bypass,
> disable_bypass, bool, S_IRUGO);
> >   MODULE_PARM_DESC(disable_bypass,
> >   	"Disable bypass streams such that incoming transactions from devices
> that are not attached to an iommu domain will report an abort back to the
> device and will not be allowed to pass through the SMMU.");
> >
> > +static bool disable_msipolling;
> > +module_param_named(disable_msipolling, disable_msipolling, bool,
> S_IRUGO);
> 
> Just use module_param() - going out of the way to specify a "different"
> name that's identical to the variable name is silly.

Thanks for pointing out, also fixed the same issue in the existing parameter
disable_bypass in the new patchset.

But I am sorry I made a typo, the new patchset should be v5. But I wrote v4.

> 
> Also I think the preference these days is to specify permissions as
> plain octal constants rather than those rather inscrutable macros. I
> certainly find that more readable myself.
> 
> (Yes, the existing parameter commits the same offences, but I'd rather
> clean that up separately than perpetuate it)

Thanks for pointing out. Got fixed in the new patchset.

> 
> > +MODULE_PARM_DESC(disable_msipolling,
> > +	"Disable MSI-based polling for CMD_SYNC completion.");
> > +
> >   enum pri_resp {
> >   	PRI_RESP_DENY = 0,
> >   	PRI_RESP_FAIL = 1,
> > @@ -980,6 +985,13 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd,
> struct arm_smmu_cmdq_ent *ent)
> >   	return 0;
> >   }
> >
> > +static bool arm_smmu_use_msipolling(struct arm_smmu_device *smmu)
> > +{
> > +	return !disable_msipolling &&
> > +	       smmu->features & ARM_SMMU_FEAT_COHERENCY &&
> > +	       smmu->features & ARM_SMMU_FEAT_MSI;
> > +}
> 
> I'd wrap this up into a new ARM_SMMU_OPT_MSIPOLL flag set at probe time,
> rather than constantly reevaluating this whole expression (now that it's
> no longer simply testing two adjacent bits of the same word).

Got it done in the new patchset. It turns out we only need to check one bit now with the new
patch:

-	if (smmu->features & ARM_SMMU_FEAT_MSI &&
-	    smmu->features & ARM_SMMU_FEAT_COHERENCY)
+	if (smmu->options & ARM_SMMU_OPT_MSIPOLL)
 		return __arm_smmu_cmdq_poll_until_msi(smmu, llq);


> 
> Robin.
> 
> > +
> >   static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct
> arm_smmu_device *smmu,
> >   					 u32 prod)
> >   {
> > @@ -992,8 +1004,7 @@ static void arm_smmu_cmdq_build_sync_cmd(u64
> *cmd, struct arm_smmu_device *smmu,
> >   	 * Beware that Hi16xx adds an extra 32 bits of goodness to its MSI
> >   	 * payload, so the write will zero the entire command on that platform.
> >   	 */
> > -	if (smmu->features & ARM_SMMU_FEAT_MSI &&
> > -	    smmu->features & ARM_SMMU_FEAT_COHERENCY) {
> > +	if (arm_smmu_use_msipolling(smmu)) {
> >   		ent.sync.msiaddr = q->base_dma + Q_IDX(&q->llq, prod) *
> >   				   q->ent_dwords * 8;
> >   	}
> > @@ -1332,8 +1343,7 @@ static int
> __arm_smmu_cmdq_poll_until_consumed(struct arm_smmu_device *smmu,
> >   static int arm_smmu_cmdq_poll_until_sync(struct arm_smmu_device
> *smmu,
> >   					 struct arm_smmu_ll_queue *llq)
> >   {
> > -	if (smmu->features & ARM_SMMU_FEAT_MSI &&
> > -	    smmu->features & ARM_SMMU_FEAT_COHERENCY)
> > +	if (arm_smmu_use_msipolling(smmu))
> >   		return __arm_smmu_cmdq_poll_until_msi(smmu, llq);
> >
> >   	return __arm_smmu_cmdq_poll_until_consumed(smmu, llq);
> >

Thanks
Barry



More information about the linux-arm-kernel mailing list