[RFC PATCH 0/3] Implement IRQ stack on ARM64
Jungseok Lee
jungseoklee85 at gmail.com
Fri Sep 4 07:23:04 PDT 2015
ARM64 kernel allocates 16KB kernel stack when creating a process. In case
of low memory platforms with tough workloads on userland, this order-2
allocation request reaches to memory pressure and performance degradation
simultaenously since VM page allocator falls into slowpath frequently,
which triggers page reclaim and compaction.
I believe that one of the best solutions is to reduce kernel stack size.
According to the following data from stack tracer with some fixes, [1],
a separate IRQ stack would greatly help to decrease a kernel stack depth.
Depth Size Location (51 entries)
----- ---- --------
0) 5352 96 _raw_spin_unlock_irqrestore+0x1c/0x60
1) 5256 48 gic_raise_softirq+0xa0/0xbc
2) 5208 80 smp_cross_call+0x40/0xbc
3) 5128 48 smp_send_reschedule+0x38/0x48
4) 5080 32 trigger_load_balance+0x184/0x29c
5) 5048 112 scheduler_tick+0xac/0x104
6) 4936 64 update_process_times+0x5c/0x74
7) 4872 32 tick_sched_handle.isra.15+0x38/0x7c
8) 4840 48 tick_sched_timer+0x48/0x90
9) 4792 48 __run_hrtimer+0x60/0x258
10) 4744 64 hrtimer_interrupt+0xe8/0x260
11) 4680 128 arch_timer_handler_virt+0x38/0x48
12) 4552 32 handle_percpu_devid_irq+0x84/0x188
13) 4520 64 generic_handle_irq+0x38/0x54
14) 4456 32 __handle_domain_irq+0x68/0xbc
15) 4424 64 gic_handle_irq+0x38/0x88
16) 4360 280 el1_irq+0x64/0xd8
17) 4080 168 ftrace_ops_no_ops+0xb4/0x16c
18) 3912 32 ftrace_call+0x0/0x4
19) 3880 144 __alloc_skb+0x48/0x180
20) 3736 96 alloc_skb_with_frags+0x74/0x234
21) 3640 112 sock_alloc_send_pskb+0x1d0/0x294
22) 3528 160 sock_alloc_send_skb+0x44/0x54
23) 3368 64 __ip_append_data.isra.40+0x78c/0xb48
24) 3304 224 ip_append_data.part.42+0x98/0xe8
25) 3080 112 ip_append_data+0x68/0x7c
26) 2968 96 icmp_push_reply+0x7c/0x144
27) 2872 96 icmp_send+0x3c0/0x3c8
28) 2776 192 __udp4_lib_rcv+0x5b8/0x684
29) 2584 96 udp_rcv+0x2c/0x3c
30) 2488 32 ip_local_deliver+0xa0/0x224
31) 2456 48 ip_rcv+0x360/0x57c
32) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
33) 2344 128 __netif_receive_skb+0x24/0x84
34) 2216 32 process_backlog+0x9c/0x15c
35) 2184 80 net_rx_action+0x1ec/0x32c
36) 2104 160 __do_softirq+0x114/0x2f0
37) 1944 128 do_softirq+0x60/0x68
38) 1816 32 __local_bh_enable_ip+0xb0/0xd4
39) 1784 32 ip_finish_output+0x1f4/0xabc
40) 1752 96 ip_output+0xf0/0x120
41) 1656 64 ip_local_out_sk+0x44/0x54
42) 1592 32 ip_send_skb+0x24/0xbc
43) 1560 48 udp_send_skb+0x1b4/0x2f4
44) 1512 80 udp_sendmsg+0x2a8/0x7a0
45) 1432 272 inet_sendmsg+0xa0/0xd0
46) 1160 48 sock_sendmsg+0x30/0x78
47) 1112 32 ___sys_sendmsg+0x15c/0x26c
48) 1080 400 __sys_sendmmsg+0x94/0x180
49) 680 320 SyS_sendmmsg+0x38/0x54
50) 360 360 el0_svc_naked+0x20/0x28
So, this patch set implements a separate percpu IRQ stack.
AFAIK, a stack tracer on ftrace does not work well. Thus, this is a single
todo list at this moment.
This series is written on top of 4.2-rc5 with drangon410c board, and it has
been validated with two different tracks: 4.2-rc5 + Linaro Ubuntu 15.04 and
3.10 + Android.
After this merge window, I will rebase this series and resend it.
Any comments or feedbacks are always welcome.
Thanks in advance!
[1]: https://lkml.org/lkml/2015/7/13/29
Jungseok Lee (3):
arm64: entry: Remove unnecessary calculation for S_SP in EL1h
arm64: Introduce IRQ stack
arm64: Reduce kernel stack size when using IRQ stack
arch/arm64/Kconfig.debug | 10 ++
arch/arm64/include/asm/irq.h | 8 ++
arch/arm64/include/asm/thread_info.h | 19 ++++
arch/arm64/kernel/asm-offsets.c | 8 ++
arch/arm64/kernel/entry.S | 85 +++++++++++++++-
arch/arm64/kernel/head.S | 7 ++
arch/arm64/kernel/irq.c | 18 ++++
7 files changed, 150 insertions(+), 5 deletions(-)
--
1.9.1
More information about the linux-arm-kernel
mailing list