[PATCH] arm64: kernel: Use a separate stack for irq interrupts.

Tue Sep 8 09:47:50 PDT 2015

On 08/09/15 15:54, Jungseok Lee wrote:
> On Sep 7, 2015, at 11:36 PM, James Morse wrote:
> 
> Hi James,
> 
>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>> index e16351819fed..d42371f3f5a1 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -190,10 +190,37 @@ tsk	.req	x28		// current thread_info
>>  * Interrupt handling.
>>  */
>> 	.macro	irq_handler
>> -	adrp	x1, handle_arch_irq
>> -	ldr	x1, [x1, #:lo12:handle_arch_irq]
>> -	mov	x0, sp
>> +	mrs	x21, tpidr_el1
>> +	adr_l	x20, irq_sp
>> +	add	x20, x20, x21
>> +
>> +	ldr	x21, [x20]
>> +	mov	x20, sp
>> +
>> +	mov	x0, x21
>> +	mov	x1, x20
>> +	bl	irq_copy_thread_info
>> +
>> +	/* test for recursive use of irq_sp */
>> +	cbz	w0, 1f
>> +	mrs	x30, elr_el1
>> +	mov	sp, x21
>> +
>> +	/*
>> +	 * Create a fake stack frame to bump unwind_frame() onto the original
>> +	 * stack. This relies on x29 not being clobbered by kernel_entry().
>> +	 */
>> +	push	x29, x30
> 
> It is the most challenging item to handle a frame pointer in this context.
> 
> As I mentioned previously, a stack tracer on ftrace is not supported yet.
> How about decoupling "IRQ Stack feature" from "Stack Tracer Fix"?

Yes - I discovered today this was more complicated than I thought. I will
need to do some more reading.

>> +
>> +1:	ldr_l	x1, handle_arch_irq
>> +	mov     x0, x20
>> 	blr	x1
>> +
>> +	mov	x0, x20
>> +	mov	x1, x21
>> +	bl	irq_copy_thread_info
>> +	mov	sp, x20
>> +
>> 	.endm
>>
>> 	.text
>> diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
>> index 463fa2e7e34c..10b57a006da8 100644
>> --- a/arch/arm64/kernel/irq.c
>> +++ b/arch/arm64/kernel/irq.c
>> @@ -26,11 +26,14 @@
>> #include <linux/smp.h>
>> #include <linux/init.h>
>> #include <linux/irqchip.h>
>> +#include <linux/percpu.h>
>> #include <linux/seq_file.h>
>> #include <linux/ratelimit.h>
>>
>> unsigned long irq_err_count;
>>
>> +DEFINE_PER_CPU(unsigned long, irq_sp) = 0;
>> +
>> int arch_show_interrupts(struct seq_file *p, int prec)
>> {
>> #ifdef CONFIG_SMP
>> @@ -55,6 +58,10 @@ void __init init_IRQ(void)
>> 	irqchip_init();
>> 	if (!handle_arch_irq)
>> 		panic("No interrupt controller found.");
>> +
>> +	/* Allocate an irq stack for the boot cpu */
>> +	if (alloc_irq_stack(smp_processor_id()))
>> +		panic("Failed to allocate irq stack for boot cpu.");
>> }
>>
>> #ifdef CONFIG_HOTPLUG_CPU
>> @@ -117,3 +124,48 @@ void migrate_irqs(void)
>> 	local_irq_restore(flags);
>> }
>> #endif /* CONFIG_HOTPLUG_CPU */
>> +
>> +/* Allocate an irq_stack for a cpu that is about to be brought up. */
>> +int alloc_irq_stack(unsigned int cpu)
>> +{
>> +	struct page *irq_stack_page;
>> +	union thread_union *irq_stack;
>> +
>> +	/* reuse stack allocated previously */
>> +	if (per_cpu(irq_sp, cpu))
>> +		return 0;
> 
> I'd like to avoid even this simple check since CPU hotplug could be heavily
> used for power management.

I don't think its a problem:
__cpu_up() contains a call to wait_for_completion_timeout() (which could
eventually end up in the scheduler), so I don't think it could ever be on a
'really hot' path.

For really frequent hotplug-like power management, cpu_suspend() makes use
of firmware support to power-off cores - from what I can see it doesn't use
__cpu_up().

>> +
>> +	irq_stack_page = alloc_kmem_pages(THREADINFO_GFP, THREAD_SIZE_ORDER);
>> +	if (!irq_stack_page) {
>> +		pr_crit("CPU%u: failed to allocate irq stack for cpu %u\n",
>> +			smp_processor_id(), cpu);
>> +		return -ENOMEM;
>> +	}
>> +	irq_stack = page_address(irq_stack_page);
>> +
>> +	per_cpu(irq_sp, cpu) = (unsigned long)irq_stack->stack
>> +			       + THREAD_START_SP;
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Copy struct thread_info between two stacks, and update current->stack.
>> + * This is used when moving to the irq exception stack.
>> + * Changing current->stack is necessary so that non-arch thread_info writers
>> + * don't use the new thread_info->task->stack to find the old thread_info when
>> + * setting flags like TIF_NEED_RESCHED...
>> + */
>> +asmlinkage int irq_copy_thread_info(unsigned long dst_sp, unsigned long src_sp)
>> +{
>> +	struct thread_info *src = get_thread_info(src_sp);
>> +	struct thread_info *dst = get_thread_info(dst_sp);
>> +
>> +	if (dst == src)
>> +		return 0;
>> +
>> +	*dst = *src;
>> +	current->stack = dst;
>> +
>> +	return 1;
>> +}
> 
> Although this is 32-byte memcpy, sizeof(struct thread_info), this function is called
> twice to handle a single interrupt. Isn's it too expensive?
> 
> This is a major difference between my approach and this patch. I think we should get
> some feedbacks on struct thread_info information management for v2 preparation.

Agreed.

The other difference, as Akashi Takahiro pointed out, is the behaviour of
object_is_on_stack(). What should this return for an object on an irq stack?

Thanks,

James