[RESEND PATCH 1/2] dt-ops: Add helper API to dump fdt blob

Tue Apr 3 06:58:11 PDT 2018

Hi James,

Sorry for the delay, I had a long weekend last week.

On Tue, Mar 27, 2018 at 7:01 PM, James Morse <james.morse at arm.com> wrote:
> Hi Akashi, Bhupesh,
>
> On 27/03/18 10:04, AKASHI Takahiro wrote:
>> On Mon, Mar 26, 2018 at 02:29:31PM +0530, Bhupesh Sharma wrote:
>>> On Tue, Mar 20, 2018 at 12:36 AM, Bhupesh Sharma <bhsharma at redhat.com> wrote:
>>>> On Mon, Mar 19, 2018 at 8:15 PM, AKASHI Takahiro
>>>> <takahiro.akashi at linaro.org> wrote:
>>>>> On Mon, Mar 19, 2018 at 04:05:38PM +0530, Bhupesh Sharma wrote:
>>>>>> At several occasions it would be useful to dump the fdt
>>>>>> blob being passed to the second (kexec/kdump) kernel
>>>>>> when '-d' flag is specified while invoking kexec/kdump.
>>>>>
>>>>> Why not just save binary data to a file and interpret it
>>>>> with dtc command?
>
> I'd prefer this too. It would also let us debug any issue where kexec-tools
> produces an invalid DTB. It also lets us test booting the kernel from firmware
> with that DTB.

I captured the use case where it is not possible to do so. I have seen
primary kernel crash before we can get to the command prompt to save
the dtb blob. Since the arm64 crashkernel still seems to have issues
itself while booting on acpi enabled machines (see
<https://www.spinics.net/lists/arm-kernel/msg616632.html>), so we are
trying to debug a problem which has two undefined variables :)

>>>> Well, there are a couple of reasons for that which can be understood
>>>> from a system which is in a production environment (for e.g. I have
>>>> worked on one arm64 system in the past which used a yocto based
>>>> distribution in which kexec -p was launched with a -d option as a part
>>>> of initial ramfs scriptware):
>
> and panics before you get an interactive prompt or persistent storage? I think
> this would be a pretty niche use-case. You could always base64-dump the dtb to
> stdout from your script.

That is pretty basic case on several new arm64 development boards
(e.g. qualcomm, huawei etc) where we are debugging issues in primary
kernel boot (and we are not even able to reach the command prompt).

If the crashkernel crashes even before the primary kernel does because
of the issues in the way DTB is passed to the crashkernel (which can
include wrong DTB fields), we better have mechanisms to track the same
rather than adding debug prints to the kernels.

Note that the print logs are enabled when -d flag is passed to
conventional 'kexec-tools' (where we already have several log
messages), so I am not sure how adding the dtb print here affects the
default (non-debug) case.

>>>> - In a production environment, installing and executing tools like dtc
>>>> (which might not have been installed by default via 'yum' or 'apt-get
>>>> install' or other means is not only an additional step, but we might
>>>> not get a chance to run it even if it is installed if we have a early
>>>> crash in kdump itself (for e.g. consider the 'acpi table access' issue
>>>> in the arm64 crashkernel we discussed some time back
>>>> <https://www.spinics.net/lists/arm-kernel/msg616632.html>):
>
> Wouldn't it be possible to transfer the dumped DTB file off the machine before
> the behaviour that brings the machine down? Kdump always requires a setup step
> to load the kdump kernel/dtb. You have to decide to add the debug flags at this
> point. I don't see how choosing to save the modified-DTB would be any different.

Its not always possible to do the same in a customer setup or on
remote board-farms.

> I assume the flow here is: do thingX, the kernel crashes. Enable kdump, do
> thingX, kdump fails to boot. Enable kdump-debug. Do thingX...

Please see the use case I shared above

>
>>>> a) In such a case the primary kernel has already crashed, so we had no
>>>> opportunity to run a dtc interpreter there and the kdump kernel itself
>>>> has crashed in a very early boot phase. So we didn't get a chance to
>>>> execute 'dtc' on the kdump kernel command prompt (if the kdump scripts
>>>> are configured not to reboot the primary again).
>
> Transfer it off the machine, save it somewhere persistent or print it to stdout
> in base64.
>
>
>>>> b) Also when an early arm64 kdump crash is reported by a customer, we
>>>> usually only have access to the primary and secondary console log
>>>> which also might include the 'kexec -p -d' log messages, which can
>>>> point us to a discrepancy in dtc nodes like 'linux,usable-memory"
>>>> which might have caused a early crashkernel crash.
>
>>>> Personally, so-far I have found this dtb dumping facility of use in
>>>> debugging atleast a couple of arm64 crashkernel crash/panic issues.
>>>> Till the arm64 kexec/kdump implementation matures further, I feel this
>>>> dumping facility is of good use to ease crashkernel panic debugs.
>>>
>>> Ping. Do you have any further comments on it?
>>
>> No.
>> While I don't think dumping fdt is so useful as "kexec -d" option
>> outputs enough information for me, you can go ahead.
>
> (likewise)
>

Regards,
Bhupesh