[PATCH 18/19] arm64: kdump: update a kernel doc

Wed Jan 20 03:49:31 PST 2016

On Wed, Jan 20, 2016 at 03:07:53PM +0900, AKASHI Takahiro wrote:
> On 01/20/2016 11:49 AM, Dave Young wrote:
> >On 01/19/16 at 02:01pm, Mark Rutland wrote:
> >>On Tue, Jan 19, 2016 at 09:45:53PM +0800, Dave Young wrote:
> >>>On 01/19/16 at 12:51pm, Mark Rutland wrote:
> >>>>On Tue, Jan 19, 2016 at 08:28:48PM +0800, Dave Young wrote:
> >>>>>On 01/19/16 at 02:35pm, AKASHI Takahiro wrote:
> >>>>>>On 01/19/2016 10:43 AM, Dave Young wrote:
> >>>>>>>X86 takes another way in latest kexec-tools and kexec_file_load, that is
> >>>>>>>recreating E820 table and pass it to kexec/kdump kernel, if the entries
> >>>>>>>are over E820 limitation then turn to use setup_data list for remain
> >>>>>>>entries.
> >>>>>>
> >>>>>>Thanks. I will visit x86 code again.
> >>>>>>
> >>>>>>>I think it is X86 specific. Personally I think device tree property is
> >>>>>>>better.
> >>>>>>
> >>>>>>Do you think so?
> >>>>>
> >>>>>I'm not sure it is the best way. For X86 we run into problem with
> >>>>>memmap= design, one example is pci domain X (X>1) need the pci memory
> >>>>>ranges being passed to kdump kernel. When we passed reserved ranges in /proc/iomem
> >>>>>to 2nd kernel we find that cmdline[] array is not big enough.
> >>>>
> >>>>I'm not sure how PCI ranges relate to the memory map used for normal
> >>>>memory (i.e. RAM), though I'm probably missing some caveat with the way
> >>>>ACPI and UEFI describe PCI. Why does memmap= affect PCI memory?
> >>>
> >>>Here is the old patch which was rejected in kexec-tools:
> >>>http://lists.infradead.org/pipermail/kexec/2013-February/007924.html
> >>>
> >>>>
> >>>>If the kernel got the rest of its system topology from DT, the PCI
> >>>>regions would be described there.
> >>>
> >>>Yes, if kdump kernel use same DT as 1st kernel.
> >>
> >>Other than for testing purposes, I don't see why you'd pass the kdump
> >>kernel a DTB inconsistent with that the 1st kernel was passsed (other
> >>than some proerties under /chosen).
> >>
> >>We added /sys/firmware/fdt specifically to allow the kexec tools to get
> >>the exact DTB the first kernel used. There's no reason for tools to have
> >>to make something up.
> >
> >Agreed, but kexec-tools has an option to pass in any dtb files. Who knows
> >how one will use it unless dropping the option and use /sys/firmware/fdt
> >unconditionally.
> 
> As a matter of fact, specifying proper command line parameters as well as
> dtb is partly users' responsibility for kdump to work correctly.
> (especially for BE kernel)
> 
> >If we choose to implement kexec_file_load only in kernel, the interfaces
> >provided are kernel, initrd and cmdline. We can always use same dtb.
> 
> I would say that we can always use the same dtb even with kexec_load
> from user's perspective. Right?

No.

This breaks using kexec for boot-loader purposes, and imposes a policy.

For better or worse kexec_file_load has always imposed a constrained
Linux-only policy, so that's a different story.

> >>There's a horrible edge case I've spotted if performing a chain of
> >>cross-endian kexecs: LE -> BE -> LE, as the BE kernel would have to
> >>respect the EFI memory map so as to avoid corrupting it for the
> >>subsequent LE kernel. Other than this I believe everything should just
> >>work.
> >
> >Firmware do not know kernel endianniess, kernel should respect firmware
> >maps and adapt to it, it sounds like a generic issue not specfic to kexec.
> 
> On arm64, a kernel image header has a bit field to specify the image's endianness.
> Anyway, our current implementation replies on a user-supplied dtb to start BE kernel.

The firmware should _never_ care about the kernel's endianness. The
bootlaoder or first kernel shouldn't care about the next kernel's
endianness apart from in exceptional circumstances. The DTB for a LE
kernel should look identical to that passed to a BE kernel.

In my mind, the only valid reason to look at that bit is so that
bootloaders can provide a warning if the CPU does not implement that
endianness.

The issue I mention above should be solved by changes to the BE kernel.

> >>>Is it possible to modify uefi memmap for kdump case?
> >>
> >>Technically it would be possible, however I don't think it's necessary,
> >>and I think it would be disadvantageous to do so.
> >>
> >>Describing the range(s) the crash kernel can use in separate properties
> >>under /chosen has a number of advantages.
> >
> >Ok, I got the points. We have a is_kdump_kernel() by checking if there is
> >elfcorehdr_addr kernel cmdline. This is mainly for some drivers which
> >do not work well in kdump kernel some uncertain reasons. But ideally I
> >think kernel should handle things just like in 1st kernel and avoid to use
> >it.
> 
> So I'm not still sure about what are advantages of a property under /chosen
> over "memmap=" kernel parameter.
> Both are simple and can have the same effect with minimizing changes to dtb.
> (But if, in the latter case, we have to provide *all* the memory-related information
> through "memmap=" parameters, it would be much complicated.)

The reason I prefer a property over command line additions include:

* It keeps the command line simple (as you mention the opposite is
  "complicated").

* It is logically separate from options the user may pass to the kernel
  in that the restricted region(s) of memory avaialble are effectively
  properties of the system (in that the crashed OS is part of the system
  state).

* The semantics of the command line parsing can change subtly over time
  (for example, see 51e158c12aca3c9a, which terminates command line
  parseing at "--"). Maknig sure that a command line option will
  actually be parsed by the next kernel is not trivial.

  Keeping this information isolated from the command line is more
  robust.

* Addition of a property is a self-contained operation, that doesn't
  require any knowledge about the command line.

Thanks,
Mark.