[PATCH v1 3/4] arm64: Add arm64 kexec support

Wed Jul 20 08:39:00 PDT 2016

Hi,

On Tue, Jul 19, 2016 at 11:28:13PM +0000, Geoff Levand wrote:
> +/**
> + * struct arm64_image_header - arm64 kernel image header.
> + *
> + * @pe_sig: Optional PE format 'MZ' signature.
> + * @branch_code: Reserved for instructions to branch to stext.
> + * @text_offset: The image load offset in LSB byte order.
> + * @image_size: An estimated size of the memory image size in LSB byte order.
> + * @flags: Bit flags:
> + *  Bit 7.0: Image byte order, 1=MSB.
> + * @reserved_1: Reserved.
> + * @magic: Magic number, "ARM\x64".
> + * @pe_header: Optional offset to a PE format header.
> + **/
> +
> +struct arm64_image_header {
> +	uint8_t pe_sig[2];
> +	uint16_t branch_code[3];
> +	uint64_t text_offset;
> +	uint64_t image_size;
> +	uint8_t flags[8];

The flags field is a 64-bit quantity, and it's rather confusing to treat
it as something else.

I think it would be better to have it as a uint64_t, and use explicit
endianness conversion as necessary to swizzle it. I beleive that's less
confusing than grabbing individual bytes.

> +static const uint64_t arm64_image_flag_7_be = 0x01U;

For this we could have:

#define ARM64_IMAGE_FLAG_BE		(1UL << 0)

> +static inline int arm64_header_check_magic(const struct arm64_image_header *h)
> +{
> +	if (!h)
> +		return 0;
> +
> +	if (!h->text_offset)
> +		return 0;

I believe that with CONFIG_RANDOMIZE_TEXT_OFFSET, it is possible that
text_offset is 0.

Regardless, I'm not sure I follow the point of this check; why isn't
checking the magic sufficient?

> +
> +	return (h->magic[0] == arm64_image_magic[0]
> +		&& h->magic[1] == arm64_image_magic[1]
> +		&& h->magic[2] == arm64_image_magic[2]
> +		&& h->magic[3] == arm64_image_magic[3]);
> +}

> +static inline int arm64_header_check_msb(const struct arm64_image_header *h)
> +{
> +	if (!h)
> +		return 0;
> +
> +	return !!(h->flags[7] & arm64_image_flag_7_be);
> +}

As above, I think this would be better as the below, perhaps wrapped
with !! if people don't like implicit bool conversion.

static inline bool arm64_header_is_be(const struct arm64_image_header *h)
{
	return le64_to_cpu(h->flags) & ARM64_IMAGE_FLAG_BE;
}

> +static int check_cpu_properties(const struct cpu_properties *cp_1,
> +	const struct cpu_properties *cp_2)
> +{
> +	assert(cp_1->hwid == cp_2->hwid);
> +
> +	if (cp_1->method != cp_2->method) {
> +		fprintf(stderr,
> +			"%s:%d: hwid-%" PRIx64 ": Error: Different cpu enable methods: %s -> %s\n",
> +			__func__, __LINE__, cp_1->hwid,
> +			cpu_enable_method_str(cp_1->method),
> +			cpu_enable_method_str(cp_2->method));
> +		return -EINVAL;
> +	}
> +
> +	if (cp_2->method != cpu_enable_method_psci) {
> +		fprintf(stderr,
> +			"%s:%d: hwid-%" PRIx64 ": Error: Unsupported cpu enable method: %s\n",
> +			__func__, __LINE__, cp_1->hwid,
> +			cpu_enable_method_str(cp_1->method));
> +		return -EINVAL;
> +	}
> +
> +	dbgprintf("%s: hwid-%" PRIx64 ": OK\n", __func__, cp_1->hwid);
> +
> +	return 0;
> +}

Does this really matter to userspace?

I agree that it makes sense to warn the user that kexec might not be
possible, but producing an error and failing doesn't seem right. Who
knows what the kernel might support in future?

> +static uint64_t read_sink(const char *command_line)
> +{
> +	uint64_t v;
> +	const char *p;
> +
> +	if (arm64_opts.port)
> +		return arm64_opts.port;
> +
> +#if defined(ARM64_DEBUG_PORT)
> +	return (uint64_t)(ARM64_DEBUG_PORT);
> +#endif
> +	if (!command_line)
> +		return 0;
> +
> +	if (!(p = strstr(command_line, "earlyprintk=")) &&
> +		!(p = strstr(command_line, "earlycon=")))
> +		return 0;
> +
> +	while (*p != ',')
> +		p++;
> +
> +	p++;
> +
> +	while (isspace(*p))
> +		p++;

Why do we skip spaces? As far as I am aware, there should not be any
spaces in the option.

> +
> +	if (*p == 0)
> +		return 0;
> +
> +	errno = 0;
> +
> +	v = strtoull(p, NULL, 0);
> +
> +	if (errno)
> +		return 0;
> +
> +	return v;
> +}

It looks like the purgatory code expects angel SWI as the earlycon,
whereas many other earlycons exist (with pl011 being extremely popular).

Regardless, if we assume a particular UART type, we should explicitly
verify that here. Otherwise the purgatory code will likely bring down
the system, and it will be very painful to debug.

Please explicitly check for the supported earlycon name.

> +
> +/**
> + * arm64_load_other_segments - Prepare the dtb, initrd and purgatory segments.
> + */
> +
> +int arm64_load_other_segments(struct kexec_info *info,
> +	uint64_t kernel_entry)
> +{
> +	int result;
> +	uint64_t dtb_base;
> +	uint64_t image_base;
> +	unsigned long hole_min;
> +	unsigned long hole_max;
> +	uint64_t purgatory_sink;
> +	char *initrd_buf = NULL;
> +	struct dtb dtb_1 = {.name = "dtb_1"};
> +	struct dtb dtb_2 = {.name = "dtb_2"};
> +	char command_line[COMMAND_LINE_SIZE] = "";
> +
> +	if (arm64_opts.command_line) {
> +		strncpy(command_line, arm64_opts.command_line,
> +			sizeof(command_line));
> +		command_line[sizeof(command_line) - 1] = 0;
> +	}
> +
> +	purgatory_sink = read_sink(command_line);
> +
> +	dbgprintf("%s:%d: purgatory sink: 0x%" PRIx64 "\n", __func__, __LINE__,
> +		purgatory_sink);
> +
> +	if (arm64_opts.dtb) {
> +		dtb_2.buf = slurp_file(arm64_opts.dtb, &dtb_2.size);
> +		assert(dtb_2.buf);
> +	}
> +
> +	result = read_1st_dtb(&dtb_1, command_line);
> +
> +	if (result && !arm64_opts.dtb) {
> +		fprintf(stderr, "kexec: Error: No device tree available.\n");
> +		return result;
> +	}
> +
> +	if (result && arm64_opts.dtb)
> +		dtb_1 = dtb_2;
> +	else if (!result && !arm64_opts.dtb)
> +		dtb_2 = dtb_1;
> +
> +	result = setup_2nd_dtb(command_line, &dtb_2);
> +
> +	if (result)
> +		return result;
> +	
> +	result =  check_cpu_nodes(&dtb_1, &dtb_2);
> +
> +	if (result)
> +		fprintf(stderr, "kexec: Warning: No device tree available.\n");

There are other reasons we'd return an error (e.g. mismatched enable
methods), so this is somewhat misleading.

I believe that in all cases we log the specific reason first anyway, so
perhaps it's best to jsut remove this warning.

Won't this also be very noisy in the case of ACPI with a stub DTB? In
that case ther are no cpu nodes, and may be no memory nodes.

[...]

> +int arm64_process_image_header(const struct arm64_image_header *h)
> +{
> +#if !defined(KERNEL_IMAGE_SIZE)
> +# define KERNEL_IMAGE_SIZE (768 * 1024)
> +#endif
> +
> +	if (!arm64_header_check_magic(h))
> +		return -EINVAL;
> +
> +	if (h->image_size) {
> +		arm64_mem.text_offset = le64_to_cpu(h->text_offset);
> +		arm64_mem.image_size = le64_to_cpu(h->image_size);
> +	} else {
> +		/* For 3.16 and older kernels. */
> +		arm64_mem.text_offset = 0x80000;
> +		arm64_mem.image_size = KERNEL_IMAGE_SIZE;
> +	}
> +
> +	return 0;
> +}

A v3.16 defconfig Image with the Linaro 14.09 GCC 4.9 toolchain is
6.3MB, so the chosen value for KERNEL_IMAGE_SIZE is far too small. I'm
not sure what to suggest as a better value, however, as I know that some
configurations are far bigger than that.

Do we expect to kexec to a v3.16 or earlier kernel, given we need a much
newer first kernel to have kexec in the first place? We could mandate
having a header with a non-zero image_size (i.e. the target kernel has
to be v3.16 or newer).

Thanks,
Mark.