[PATCH] irqchip/gic-v3: use dsb(ishst) to synchronize data to smp before issuing ipi
Barry Song
21cnbao at gmail.com
Sat Feb 19 17:33:51 PST 2022
> So there is no much difference between vanilla and patched kernel.
Sorry, let me correct it.
I realize I should write some data before sending IPI. So I have changed the module
to be as below:
#include <linux/module.h>
#include <linux/timekeeping.h>
volatile int data0 ____cacheline_aligned;
volatile int data1 ____cacheline_aligned;
volatile int data2 ____cacheline_aligned;
volatile int data3 ____cacheline_aligned;
volatile int data4 ____cacheline_aligned;
volatile int data5 ____cacheline_aligned;
volatile int data6 ____cacheline_aligned;
static void ipi_latency_func(void *val)
{
}
static int __init ipi_latency_init(void)
{
ktime_t stime, etime, delta;
int cpu, i;
int start = smp_processor_id();
stime = ktime_get();
for ( i = 0; i < 1000; i++)
for (cpu = 0; cpu < 96; cpu++) {
data0 = data1 = data2 = data3 = data4 = data5 = data6 = cpu;
smp_call_function_single(cpu, ipi_latency_func, NULL, 1);
}
etime = ktime_get();
delta = ktime_sub(etime, stime);
printk("%s ipi from cpu%d to cpu0-95 delta of 1000times:%lld\n",
__func__, start, delta);
return 0;
}
module_init(ipi_latency_init);
static void ipi_latency_exit(void)
{
}
module_exit(ipi_latency_exit);
MODULE_DESCRIPTION("IPI benchmark");
MODULE_LICENSE("GPL");
after that, I can see ~1% difference between patched kernel and vanilla:
vanilla:
[ 375.220131] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126757449
[ 375.382596] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126784249
[ 375.537975] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126177703
[ 375.686823] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127022281
[ 375.849967] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126184883
[ 375.999173] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127374585
[ 376.149565] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:125778089
[ 376.298743] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126974441
[ 376.451125] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127357625
[ 376.606006] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126228184
[ 371.405378] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151851181
[ 371.591642] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151568608
[ 371.767906] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151853441
[ 371.944031] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:152065453
[ 372.114085] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:146122093
[ 372.291345] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151379636
[ 372.459812] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151854411
[ 372.629708] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:145750720
[ 372.807574] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151629448
[ 372.994979] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:151050253
patched kernel:
[ 105.598815] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:124467401
[ 105.748368] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123474209
[ 105.900400] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123558497
[ 106.043890] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:122993951
[ 106.191845] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:122984223
[ 106.348215] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123323609
[ 106.501448] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:124507583
[ 106.656358] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123386963
[ 106.804367] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123340664
[ 106.956331] ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123285324
[ 108.930802] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:143616067
[ 109.094750] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:148969821
[ 109.267428] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:149648418
[ 109.443274] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:149448903
[ 109.621760] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:147882917
[ 109.794611] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:148700282
[ 109.975197] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:149050595
[ 110.141543] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:143566604
[ 110.315213] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:149202898
[ 110.491008] ipi_latency_init ipi from cpu48 to cpu0-95 delta of 1000times:148958261
as you can see, while cpu0 is the source, vanilla takes 125xxxxxx-127xxxxxx ns, patched
kernel takes 122xxxxxx-124xxxxxx ns.
Thanks
Barry
More information about the linux-arm-kernel
mailing list