[PATCH v4 4/7] arm64: Handle early CPU boot failures
Catalin Marinas
catalin.marinas at arm.com
Wed Feb 3 04:57:38 PST 2016
Hi Suzuki,
On Mon, Jan 25, 2016 at 06:07:02PM +0000, Suzuki K. Poulose wrote:
> +/* Values for secondary_data.status */
> +
> +#define CPU_MMU_OFF -1
> +#define CPU_BOOT_SUCCESS 0
> +/* The cpu invoked ops->cpu_die, synchronise it with cpu_kill */
> +#define CPU_KILL_ME 1
> +/* The cpu couldn't die gracefully and is looping in the kernel */
> +#define CPU_STUCK_IN_KERNEL 2
> +/* Fatal system error detected by secondary CPU, crash the system */
> +#define CPU_PANIC_KERNEL 3
Please add braces around these numbers, just in case (I added them
locally).
> /*
> + * The booting CPU updates the failed status, with MMU turned off,
> + * below which lies in head.txt to make sure it doesn't share the same writeback
> + * granule. So that we can invalidate it properly.
I can't really parse this (it looks like punctuation in the wrong place;
also "share the same..." with what?).
> + *
> + * update_early_cpu_boot_status tmp, status
> + * - Corrupts tmp, x0, x1
> + * - Writes 'status' to __early_cpu_boot_status and makes sure
> + * it is committed to memory.
> + */
> +
> + .macro update_early_cpu_boot_status tmp, status
> + mov \tmp, lr
> + adrp x0, __early_cpu_boot_status
> + add x0, x0, #:lo12:__early_cpu_boot_status
Nitpick: you could use the adr_l macro.
> + mov x1, #\status
> + str x1, [x0]
> + add x1, x0, 4
> + bl __inval_cache_range
> + mov lr, \tmp
> + .endm
If the CPU that's currently booting has the MMU off, what's the point of
invalidating the cache here? The operation may not even be broadcast to
the other CPU. So you actually need the invalidation before reading the
status on the primary CPU.
> +
> +ENTRY(__early_cpu_boot_status)
> + .long 0
> +END(__early_cpu_boot_status)
I think we should just do like __boot_cpu_mode and place it in the
.data..cacheline_aligned section. You can always use the safe
clean+invalidate before reading the value so that we don't care much
about the write-back granule.
> @@ -89,12 +101,14 @@ static DECLARE_COMPLETION(cpu_running);
> int __cpu_up(unsigned int cpu, struct task_struct *idle)
> {
> int ret;
> + int status;
>
> /*
> * We need to tell the secondary core where to find its stack and the
> * page tables.
> */
> secondary_data.stack = task_stack_page(idle) + THREAD_START_SP;
> + update_cpu_boot_status(CPU_MMU_OFF);
> __flush_dcache_area(&secondary_data, sizeof(secondary_data));
>
> /*
> @@ -117,7 +131,35 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> }
>
> + /* Make sure the update to status is visible */
> + smp_rmb();
Which status? In relation to what?
> secondary_data.stack = NULL;
> + status = READ_ONCE(secondary_data.status);
> + if (ret && status) {
> +
> + if (status == CPU_MMU_OFF)
> + status = READ_ONCE(__early_cpu_boot_status);
You need cache maintenance before reading this.
> +
> + switch (status) {
> + default:
> + pr_err("CPU%u: failed in unknown state : 0x%x\n",
> + cpu, status);
> + break;
> + case CPU_KILL_ME:
> + if (!op_cpu_kill(cpu)) {
> + pr_crit("CPU%u: died during early boot\n", cpu);
> + break;
> + }
> + /* Fall through */
> + pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
> + case CPU_STUCK_IN_KERNEL:
> + pr_crit("CPU%u: is stuck in kernel\n", cpu);
> + cpus_stuck_in_kernel++;
> + break;
> + case CPU_PANIC_KERNEL:
> + panic("CPU%u detected unsupported configuration\n", cpu);
> + }
> + }
>
> return ret;
> }
BTW, you can send a fix-up on top of this series with corrections, I can
fold them in.
--
Catalin
More information about the linux-arm-kernel
mailing list