[PATCH v3] iommu/arm-smmu-v3: permit users to disable MSI polling
Song Bao Hua (Barry Song)
song.bao.hua at hisilicon.com
Mon Aug 3 16:40:07 EDT 2020
> -----Original Message-----
> From: John Garry
> Sent: Tuesday, August 4, 2020 3:34 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua at hisilicon.com>; will at kernel.org;
> robin.murphy at arm.com; joro at 8bytes.org; iommu at lists.linux-foundation.org
> Cc: Zengtao (B) <prime.zeng at hisilicon.com>;
> linux-arm-kernel at lists.infradead.org
> Subject: Re: [PATCH v3] iommu/arm-smmu-v3: permit users to disable MSI
> polling
>
> On 01/08/2020 08:47, Barry Song wrote:
> > Polling by MSI isn't necessarily faster than polling by SEV. Tests on
> > hi1620 show hns3 100G NIC network throughput can improve from 25G to
> > 27G if we disable MSI polling while running 16 netperf threads sending
> > UDP packets in size 32KB.
>
> BTW, Do we have any more results than this? This is just one scenario.
>
John, it is more than a scenario. Micro-benchmark shows polling by SEV has less latency
than MSI. This motivated me to use a real scenario to verify. For this network case, if we set
thread to 1 rather than 16, network TX through can improve from 7Gbps to 7.7Gbps
> How about your micro-benchmark, which allows you to set the number of
> CPUs?
The micro-benchmark is working like this:
Sending A CMD_SYNC in an empty command queue
Polling the completion of this CMD_SYNC by MSI or SEV.
I have seen the polling latency can decrease by about 80ns. Without this patch,
the latency was about ~270ns, after this patch, it would be about
~190ns.
>
> Thanks,
> John
>
> > This patch provides a command line option so that users can decide to
> > use MSI polling or not based on their tests.
> >
> > Signed-off-by: Barry Song <song.bao.hua at hisilicon.com>
> > ---
> > -v3:
> > * rebase on top of linux-next as arm-smmu-v3.c has moved;
> > * provide a command line option
> >
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18
> ++++++++++++++----
> > 1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 7196207be7ea..89d3cb391fef 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -418,6 +418,11 @@ module_param_named(disable_bypass,
> disable_bypass, bool, S_IRUGO);
> > MODULE_PARM_DESC(disable_bypass,
> > "Disable bypass streams such that incoming transactions from devices
> that are not attached to an iommu domain will report an abort back to the
> device and will not be allowed to pass through the SMMU.");
> >
> > +static bool disable_msipolling;
> > +module_param_named(disable_msipolling, disable_msipolling, bool,
> S_IRUGO);
> > +MODULE_PARM_DESC(disable_msipolling,
> > + "Disable MSI-based polling for CMD_SYNC completion.");
> > +
> > enum pri_resp {
> > PRI_RESP_DENY = 0,
> > PRI_RESP_FAIL = 1,
> > @@ -980,6 +985,13 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd,
> struct arm_smmu_cmdq_ent *ent)
> > return 0;
> > }
> >
> > +static bool arm_smmu_use_msipolling(struct arm_smmu_device *smmu)
> > +{
> > + return !disable_msipolling &&
> > + smmu->features & ARM_SMMU_FEAT_COHERENCY &&
> > + smmu->features & ARM_SMMU_FEAT_MSI;
> > +}
> > +
> > static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct
> arm_smmu_device *smmu,
> > u32 prod)
> > {
> > @@ -992,8 +1004,7 @@ static void arm_smmu_cmdq_build_sync_cmd(u64
> *cmd, struct arm_smmu_device *smmu,
> > * Beware that Hi16xx adds an extra 32 bits of goodness to its MSI
> > * payload, so the write will zero the entire command on that platform.
> > */
> > - if (smmu->features & ARM_SMMU_FEAT_MSI &&
> > - smmu->features & ARM_SMMU_FEAT_COHERENCY) {
> > + if (arm_smmu_use_msipolling(smmu)) {
> > ent.sync.msiaddr = q->base_dma + Q_IDX(&q->llq, prod) *
> > q->ent_dwords * 8;
> > }
> > @@ -1332,8 +1343,7 @@ static int
> __arm_smmu_cmdq_poll_until_consumed(struct arm_smmu_device *smmu,
> > static int arm_smmu_cmdq_poll_until_sync(struct arm_smmu_device
> *smmu,
> > struct arm_smmu_ll_queue *llq)
> > {
> > - if (smmu->features & ARM_SMMU_FEAT_MSI &&
> > - smmu->features & ARM_SMMU_FEAT_COHERENCY)
> > + if (arm_smmu_use_msipolling(smmu))
> > return __arm_smmu_cmdq_poll_until_msi(smmu, llq);
> >
> > return __arm_smmu_cmdq_poll_until_consumed(smmu, llq);
> >
Thanks
Barry
More information about the linux-arm-kernel
mailing list