[PATCH] irqchip/gic-v3: use dsb(ishst) to synchronize data to smp before issuing ipi

Barry Song 21cnbao at gmail.com
Sat Feb 19 17:33:51 PST 2022


> So there is no much difference between vanilla and patched kernel.

Sorry, let me correct it.

I realize I should write some data before sending IPI. So I have changed the module
to be as below:

#include <linux/module.h>
#include <linux/timekeeping.h>

volatile int data0 ____cacheline_aligned;
volatile int data1 ____cacheline_aligned;
volatile int data2 ____cacheline_aligned;
volatile int data3 ____cacheline_aligned;
volatile int data4 ____cacheline_aligned;
volatile int data5 ____cacheline_aligned;
volatile int data6 ____cacheline_aligned;

static void ipi_latency_func(void *val)
{
}

static int __init ipi_latency_init(void)
{

        ktime_t stime, etime, delta;
        int cpu, i;
        int start = smp_processor_id();

        stime = ktime_get();
        for ( i = 0; i < 1000; i++)
                for (cpu = 0; cpu < 96; cpu++) {
                        data0 = data1 = data2 = data3 = data4 = data5 = data6 = cpu;
                        smp_call_function_single(cpu, ipi_latency_func, NULL, 1); 
                }   
        etime = ktime_get();

        delta = ktime_sub(etime, stime);

        printk("%s ipi from cpu%d to cpu0-95 delta of 1000times:%lld\n",
                        __func__, start, delta);

        return 0;
}
module_init(ipi_latency_init);

static void ipi_latency_exit(void)
{
}
module_exit(ipi_latency_exit);

MODULE_DESCRIPTION("IPI benchmark");
MODULE_LICENSE("GPL");

after that, I can see ~1% difference between patched kernel and vanilla:

vanilla:
[  375.220131] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126757449
[  375.382596] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126784249
[  375.537975] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126177703
[  375.686823] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127022281
[  375.849967] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126184883
[  375.999173] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127374585
[  376.149565] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:125778089
[  376.298743] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126974441
[  376.451125] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127357625
[  376.606006] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126228184

[  371.405378] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151851181
[  371.591642] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151568608
[  371.767906] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151853441
[  371.944031] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:152065453
[  372.114085] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:146122093
[  372.291345] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151379636
[  372.459812] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151854411
[  372.629708] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:145750720
[  372.807574] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151629448
[  372.994979] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151050253

patched kernel:
[  105.598815] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:124467401
[  105.748368] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123474209
[  105.900400] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123558497
[  106.043890] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:122993951
[  106.191845] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:122984223
[  106.348215] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123323609
[  106.501448] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:124507583
[  106.656358] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123386963
[  106.804367] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123340664
[  106.956331] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123285324

[  108.930802] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:143616067
[  109.094750] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:148969821
[  109.267428] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:149648418
[  109.443274] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:149448903
[  109.621760] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:147882917
[  109.794611] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:148700282
[  109.975197] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:149050595
[  110.141543] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:143566604
[  110.315213] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:149202898
[  110.491008] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:148958261

as you can see, while cpu0 is the source, vanilla takes 125xxxxxx-127xxxxxx ns, patched
kernel takes 122xxxxxx-124xxxxxx ns.

Thanks
Barry



More information about the linux-arm-kernel mailing list