[bug-report] possible performance problem in ret_to_user_from_irq

Jens Axboe axboe at kernel.dk
Wed Jan 4 06:45:43 PST 2023


On 1/4/23 12:04?AM, Hui Tang wrote:
> 
> 
> On 2023/1/3 22:59, Jens Axboe wrote:
>> On 1/3/23 7:34?AM, Russell King (Oracle) wrote:
>>> On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote:
>>>> On 1/3/23 3:06?AM, Russell King (Oracle) wrote:
>>>>> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote:
>>>>>> hi folks.
>>>>>>
>>>>>> I found a performance problem which is introduced by commit
>>>>>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL").
>>>>>> After the commit,  any bit in the range of 0..15 will cause
>>>>>> do_work_pending() to be invoked. More frequent do_work_pending()
>>>>>> invoked possible result in worse performance.
>>>>>>
>>>>>> Some of the tests I've done? as follows:
>>>>>> lmbench test            base        with patch
>>>>>> ./lat_ctx -P 1 -s 0  2        7.3167        11.04
>>>>>> ./lat_ctx -P 1 -s 16 2          8.0467        14.5367
>>>>>> ./lat_ctx -P 1 -s 64 2        7.8667        11.43
>>>>>> ./lat_ctx -P 1 -s 16 16        16.47        18.3667
>>>>>> ./lat_pipe -P 1            28.1671        44.7904
>>>>>>
>>>>>> libMicro-0.4.1 test        base        with patch
>>>>>> ./cascade_cond -E -C 200\
>>>>>>  -L -S -W -N "c_cond_1" -I 100    286.3333    358
>>>>>>
>>>>>> When I adjust test bit, the performance problem gone.
>>>>>> -    movs    r1, r1, lsl #16
>>>>>> +    ldr    r2, =#_TIF_WORK_MASK
>>>>>> +    tst    r1, r2
>>>>>>
>>>>>> Does anyone have a good suggestion for this problem?
>>>>>> should just test _TIF_WORK_MASK, as before?
>>>>>
>>>>> I think it should be fine - but I would suggest re-organising the
>>>>> TIF definitions so that those TIF bits that shouldn't trigger
>>>>> do_work_pending are not in the first 16 bits.
>>>>>
>>>>> Note that all four bits in _TIF_SYSCALL_WORK need to stay within
>>>>> an 8-bit even-bit-aligned range, so the value is suitable for an
>>>>> immediate assembly constant.
>>>>>
>>>>> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to
>>>>> 20..23, and then 8 to 4.
>>>>
>>>> Like this?
>>>>
>>>> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
>>>> index aecc403b2880..7f092cb55a41 100644
>>>> --- a/arch/arm/include/asm/thread_info.h
>>>> +++ b/arch/arm/include/asm/thread_info.h
>>>> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
>>>>  #define TIF_NEED_RESCHED    1    /* rescheduling necessary */
>>>>  #define TIF_NOTIFY_RESUME    2    /* callback before returning to user */
>>>>  #define TIF_UPROBE        3    /* breakpointed or singlestepping */
>>>> -#define TIF_SYSCALL_TRACE    4    /* syscall trace active */
>>>> -#define TIF_SYSCALL_AUDIT    5    /* syscall auditing active */
>>>> -#define TIF_SYSCALL_TRACEPOINT    6    /* syscall tracepoint instrumentation */
>>>> -#define TIF_SECCOMP        7    /* seccomp syscall filtering active */
>>>> -#define TIF_NOTIFY_SIGNAL    8    /* signal notifications exist */
>>>> +#define TIF_NOTIFY_SIGNAL    4    /* signal notifications exist */
>>>>
>>>>  #define TIF_USING_IWMMXT    17
>>>>  #define TIF_MEMDIE        18    /* is terminating due to OOM killer */
>>>> -#define TIF_RESTORE_SIGMASK    20
>>>> +#define TIF_RESTORE_SIGMASK    19
>>>> +#define TIF_SYSCALL_TRACE    20    /* syscall trace active */
>>>> +#define TIF_SYSCALL_AUDIT    21    /* syscall auditing active */
>>>> +#define TIF_SYSCALL_TRACEPOINT    22    /* syscall tracepoint instrumentation */
>>>> +#define TIF_SECCOMP        23    /* seccomp syscall filtering active */
>>>> +
>>>>
>>>>  #define _TIF_SIGPENDING        (1 << TIF_SIGPENDING)
>>>>  #define _TIF_NEED_RESCHED    (1 << TIF_NEED_RESCHED)
>>>
>>> Yep, LGTM, thanks.
>>
>> Hui Tang, can you give it a whirl? Just checked and it applies to
>> 5.10-stable as well, just with a slight offset.
> 
> With the latest patch, the testcase rusults shown in the 'new patch' column.
> I also retested previous commit of 32d59773da38, the results shown in the 'base' column.
> 
> lmbench test                base        32d59773da38      new patch
> ./lat_ctx -P 1 -s 0  2      8.04          11.04        8.25
> ./lat_ctx -P 1 -s 16 2      9.08          14.5367         9.26
> ./lat_ctx -P 1 -s 64 2      8.78          11.43        8.71
> ./lat_ctx -P 1 -s 16 16     17.22         18.3667         17.32
> ./lat_pipe -P 1             43.5021       44.7904         41.3729
> 
> libMicro-0.4.1 test              base  32d59773da38    new patch
> ./cascade_cond -E -C 200\
>  -L -S -W -N "c_cond_1" -I 100    281    358          281
> 
> The performance problem also seem to gone with the latest patch, thanks.

Thanks for testing! I'm going to send it out and add your tested-by (and
reported-by).

-- 
Jens Axboe




More information about the linux-arm-kernel mailing list