[PATCH] IPI performance benchmark

Wed Dec 13 03:31:56 PST 2017

On 12/13/2017 12:23 PM, Yury Norov wrote:
> On Mon, Dec 11, 2017 at 05:30:25PM +0100, Christian Borntraeger wrote:
>>
>>
>> On 12/11/2017 03:55 PM, Yury Norov wrote:
>>> On Mon, Dec 11, 2017 at 03:35:02PM +0100, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 12/11/2017 03:16 PM, Yury Norov wrote:
>>>>> This benchmark sends many IPIs in different modes and measures
>>>>> time for IPI delivery (first column), and total time, ie including
>>>>> time to acknowledge the receive by sender (second column).
>>>>>
>>>>> The scenarios are:
>>>>> Dry-run:	do everything except actually sending IPI. Useful
>>>>> 		to estimate system overhead.
>>>>> Self-IPI:	Send IPI to self CPU.
>>>>> Normal IPI:	Send IPI to some other CPU.
>>>>> Broadcast IPI:	Send broadcast IPI to all online CPUs.
>>>>>
>>>>> For virtualized guests, sending and reveiving IPIs causes guest exit.
>>>>> I used this test to measure performance impact on KVM subsystem of
>>>>> Christoffer Dall's series "Optimize KVM/ARM for VHE systems".
>>>>>
>>>>> https://www.spinics.net/lists/kvm/msg156755.html
>>>>>
>>>>> Test machine is ThunderX2, 112 online CPUs. Below the results normalized
>>>>> to host dry-run time. Smaller - better.
>>>>>
>>>>> Host, v4.14:
>>>>> Dry-run:	  0	    1
>>>>> Self-IPI:         9	   18
>>>>> Normal IPI:      81	  110
>>>>> Broadcast IPI:    0	 2106
>>>>>
>>>>> Guest, v4.14:
>>>>> Dry-run:          0	    1
>>>>> Self-IPI:        10	   18
>>>>> Normal IPI:     305	  525
>>>>> Broadcast IPI:    0    	 9729
>>>>>
>>>>> Guest, v4.14 + VHE:
>>>>> Dry-run:          0	    1
>>>>> Self-IPI:         9	   18
>>>>> Normal IPI:     176	  343
>>>>> Broadcast IPI:    0	 9885
>> [...]
>>>>> +static int __init init_bench_ipi(void)
>>>>> +{
>>>>> +	ktime_t ipi, total;
>>>>> +	int ret;
>>>>> +
>>>>> +	ret = bench_ipi(NTIMES, DRY_RUN, &ipi, &total);
>>>>> +	if (ret)
>>>>> +		pr_err("Dry-run FAILED: %d\n", ret);
>>>>> +	else
>>>>> +		pr_err("Dry-run:       %18llu, %18llu ns\n", ipi, total);
>>>>
>>>> you do not use NTIMES here to calculate the average value. Is that intended?
>>>
>>> I think, it's more visually to represent all results in number of dry-run
>>> times, like I did in patch description. So on kernel side I expose raw data
>>> and calculate final values after finishing tests.
>>
>> I think it is highly confusing that the output from the patch description does not
>> match the output from the real module. So can you make that match at least?
> 
> I think so. That's why I noticed that results are normalized to host dry-run
> time, even more, they are small and better for human perception.
> 
> I was recommended not to public raw data, you'd understand. If this is
> the blocker, I can post results from QEMU-hosted kernel.

you could just post some example data from any random x86 laptop. I think it
would just be good to have the patch description output match the real output.