[PATCH v3 00/17] kexec: Allow preservation of ftrace buffers

Tue Feb 6 06:40:22 PST 2024

On Tue, Feb 06, 2024 at 02:43:15PM +0100, Alexander Graf wrote:
> Hey Oleksij!
> 
> On 06.02.24 09:17, Oleksij Rempel wrote:
> > Hi Alexander,
> > 
> > Nice work!
> > 
> > On Wed, Jan 17, 2024 at 02:46:47PM +0000, Alexander Graf wrote:
> > > Make sure to fill ftrace with contents that you want to observe after
> > > kexec.  Then, before you invoke file based "kexec -l", activate KHO:
> > > 
> > >    # echo 1 > /sys/kernel/kho/active
> > >    # kexec -l Image --initrd=initrd -s
> > >    # kexec -e
> > > 
> > > The new kernel will boot up and contain the previous kernel's trace
> > > buffers in /sys/kernel/debug/tracing/trace.
> > Assuming:
> > - we wont to start tracing as early as possible, before rootfs
> >    or initrd would be able to configure it.
> > - traces are stored on a different device, not RAM. For example NVMEM.
> > - Location of NVMEM is different for different board types, but
> >    bootloader is able to give the right configuration to the kernel.
> 
> 
> Let me try to really understand what you're tracing here. Are we talking
> about exposing boot loader traces into Linux [1]? In that case, I think a
> mechanism like [2] is what you're looking for.
> 
> Or do you want to transfer genuine Linux ftrace traces? In that case, why
> would you want to store them outside of RAM?

The high level object of what i need is to find how embedded systems in
fields do break. Since this devices should be always on, there are
different situations where system may reboot. For example, voltage
related issues, temperature, scheduled system updates, HW or SW errors.

To get better understand on what is going on, information should be
collected. But there are some limitations:
- voltage drops can be recorder only with prepared HW:
  https://www.spinics.net/lists/devicetree/msg644030.html

- In case of voltage drops RAM or block devices can't be used. Instead,
  some variant of NVMEM should be used. In my case, NVMEM has 8 bits of
  storage :) So, only one entry of the "trace" is compressed to this storage.
  https://lore.kernel.org/all/20240124122204.730370-1-o.rempel@pengutronix.de
  The reset reason information is provide by kernel and used by firmware
  and kernel on next reboot

The implementation is not a big deal. The problematic part is the way
how the system should get information about existence of recorder and
where the recorder should stored things, for example NVMEM cell.

In my initial implementation I used devicetree to configure the software
based recorder and linked it with NVMEM cell. But it is against the DT
purpose to describe only HW and it makes this recorder unusable for
not DT basd systems.

Krzysztof is suggesting to configure it from initrd. This has own
limitations as well:
 - record can't be used before initrd.
 - we have multiple configuration point of board specific information - 
   firmware (bootloader) and initrd.
 - initrd take place and reduce boot time for device which do not needed
   it before.

Other variants like kernel command-line and/or module parameters seems
to be not acceptable depending maintainer. So, I'm still seeking
proper, acceptable, portable way to hand over not HW specific
information to the kernel.

> > What would be the best, acceptable for mainline, way to provide this
> > kind of configuration? At least part of this information do not
> > describes devices or device states, this would not fit in to devicetree
> > universe. Amount of possible information would not fit in to bootconfig
> > too.
> 
> 
> We have precedence for configuration in device tree: You can use device tree
> to describe partitions on a NAND device, you can use it to specify MAC
> address overrides of devices attached to USB, etc etc. At the end of the day
> when people say they don't want configuration in device tree, what they mean
> is that device tree should be a hand over data structure from firmware to
> kernel, not from OS integrator to kernel :). If your firmware is the place
> that knows about offsets and you need to pass those offsets, IMHO DT is a
> good fit.

Yes, the layout of the NVMEM can be described in the DT. How can I tell
the system that this NVMEM cell should be used by some recorder or
tracer? Before sysfs is available any how. @Krzysztof ?

> > Other more or less overlapping use case I have in mind is a netbootable
> > embedded system with a requirement to boot as fast as possible. Since
> > bootloader already established a link and got all needed ip
> > configuration, it would be able to hand over etherent controller and ip
> > configuration states. Wille be the KHO the way to go for this use case?
> 
> 
> That's an interesting one too. I would lean towards "try with normal device
> tree first" here as well. It's again a very clear case of "firmware wants to
> tell OS about things it knows, but the OS doesn't know" to me. That means
> device tree should be fine to describe it.

I can imagine description of PHY and MAC state. But IP configuration
state of the firmware seems to be out of DT scope?

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |